二进制部署 kubernetes v1.11.x 高可用集群

由于Kubernetes已经release了1.15版本,此文档不再维护!

更新记录

  • 2019年2月13日由于runc逃逸漏洞CVE-2019-5736,根据kubernetes的文档建议,修改docker-ce版本为18.09.2
  • 2019年1月7日添加基于ingress-nginx使用域名+HTTPS的方式访问kubernetes-Dashboard
  • 2019年1月2日添加RBAC规则,修复kube-apiserver无法访问kubelet的问题
  • 2019年1月1日调整master节点和worker节点的操作步骤,添加CoreDNS的configmap中的hosts静态解析
  • 2018年12月28日修改kube-prometheus部分,修复Prometheus的Targets无法发现的问题
  • 2018年12月26日修改kubernetes-dashboard链接指向
  • 2018年12月25日修改kubele.config.file路径问题
  • 2018年12月18日修改kubelet和kube-proxy启动时加载config file
  • 2018年12月17日添加EFK部署内容
  • 2018年12月16日添加prometheus-operator部署内容
  • 2018年12月14日添加helm部署内容,拆分etcd的server证书和client证书
  • 2018年12月13日添加rook-ceph部署内容
  • 2018年12月12日添加Metrics-Server内容
  • 2018年12月11日添加Dashboard、Ingress内容
  • 2018年12月10日添加kube-flannel、calico、CoreDNS内容
  • 2018年12月9日分拆master节点和work节点的内容
  • 2018年12月8日初稿

介绍

本次部署方式为二进制可执行文件的方式部署

  • 注意请根据自己的实际情况调整
  • 对于生产环境部署,请注意某些参数的选择

如无特殊说明,均在k8s-m1节点上执行

参考博文

感谢两位大佬的文章,这里整合一下两位大佬的内容,结合自己的理解整理本文

软件版本

网络信息

  • 基于CNI的模式实现容器网络
  • Cluster IP CIDR: 10.244.0.0/16
  • Service Cluster IP CIDR: 10.96.0.0/12
  • Service DNS IP: 10.96.0.10
  • Kubernetes API VIP: 172.16.80.200

节点信息

  • 操作系统可采用 Ubuntu Server 16.04+CentOS 7.4+,本文使用CentOS 7.6 (1810) Minimal
  • keepalived提供VIP
  • haproxy提供kube-apiserver四层负载均衡
  • 由于实验环境受限,以3台服务器同时作为master和worker节点运行
  • 服务器配置请根据实际情况适当调整
IP地址主机名角色CPU内存
172.16.80.201k8s-m1master+worker48G
172.16.80.202k8s-m2master+worker48G
172.16.80.203k8s-m3master+worker48G

目录说明

  • /usr/local/bin/:存放kubernetes和etcd二进制文件
  • /opt/cni/bin/: 存放cni-plugin二进制文件
  • /etc/etcd/:存放etcd配置文件和SSL证书
  • /etc/kubernetes/:存放kubernetes配置和SSL证书
  • /etc/cni/net.d/:安装CNI插件后会在这里生成配置文件
  • $HOME/.kube/:kubectl命令会在家目录下建立此目录,用于保存访问kubernetes集群的配置和缓存
  • $HOME/.helm/:helm命令会建立此目录,用于保存helm缓存和repository信息

事前准备

事情准备在所有服务器上都需要完成

部署过程以root用户完成

  • 所有服务器网络互通k8s-m1可以通过SSH证书免密登录到其他master节点,用于分发文件
  • 编辑/etc/hosts
1
2
3
4
5
6
7
cat > /etc/hosts <<EOF
127.0.0.1 localhost
172.16.80.200 k8s-vip
172.16.80.201 k8s-m1
172.16.80.202 k8s-m2
172.16.80.203 k8s-m3
EOF
  • 时间同步服务

集群系统需要各节点时间同步

参考链接:RHEL7官方文档

这里使用公网对时,如果需要内网对时,请自行配置

1
2
3
yum install -y chrony
systemctl enable chronyd
systemctl start chronyd
  • 关闭firewalld和SELINUX(可根据实际情况自行决定关闭不需要的服务)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
systemctl stop firewalld
systemctl disable firewalld
systemctl mask firewalld
# 清空iptables规则
iptables -t filter -F
iptables -t filter -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -t raw -F
iptables -t raw -X
iptables -t security -F
iptables -t security -X
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
1
2
setenforce 0
sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config
  • 禁用swap
1
2
swapoff -a && sysctl -w vm.swappiness=0
sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
  • 添加sysctl参数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
cat > /etc/sysctl.d/centos.conf <<EOF 
# 最大文件句柄数
fs.file-max=1024000
# 最大文件打开数
fs.nr_open=1024000
# 端口最大的监听队列的长度
net.core.somaxconn=4096
# 在CentOS7.4引入了一个新的参数来控制内核的行为。
# /proc/sys/fs/may_detach_mounts 默认设置为0
# 当系统有容器运行的时候,需要将该值设置为1。
fs.may_detach_mounts = 1
# 最大文件打开数
fs.nr_open=1024000
# 二层的网桥在转发包时也会被iptables的FORWARD规则所过滤
net.bridge.bridge-nf-call-arptables=1
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
# 关闭严格校验数据包的反向路径
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0
# 打开ipv4数据包转发
net.ipv4.ip_forward=1
# 允许应用程序能够绑定到不属于本地网卡的地址
net.ipv4.ip_nonlocal_bind=1
# 表示最大限度使用物理内存,然后才是swap空间
vm.swappiness = 0
# 设置系统TCP连接keepalive的持续时间,默认7200
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10
EOF

# 让sysctl参数生效
sysctl --system
  • 确保操作系统已经最新
1
yum update -y
  • 安装软件包
1
2
3
yum groups install base -y
yum install epel-release bash-completion-extras -y
yum install git vim ipvsadm tree dstat iotop htop socat ipset conntrack -y
  • 加载ipvs模块
1
2
3
4
5
6
7
8
9
10
11
12
13
# 开机自动加载ipvs模块
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_fo ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack_ipv4"
for kernel_module in \${ipvs_modules}; do
/sbin/modinfo -F filename \${kernel_module} > /dev/null 2>&1
if [ $? -eq 0 ]; then
/sbin/modprobe \${kernel_module}
fi
done
EOF

chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep ip_vs
  • 安装docker-ce 18.09.2
1
2
3
4
5
yum remove docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-selinux docker-engine-selinux docker-engine -y
yum install -y yum-utils device-mapper-persistent-data lvm2 -y
yum-config-manager --add-repo http://mirrors.ustc.edu.cn/docker-ce/linux/centos/docker-ce.repo
sed -e 's,download.docker.com,mirrors.aliyun.com/docker-ce,g' -i /etc/yum.repos.d/docker-ce.repo
yum install docker-ce-18.09.2 -y
  • 创建docker配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
mkdir -p /etc/docker
cat>/etc/docker/daemon.json<<EOF
{
"registry-mirrors": ["https://registry.docker-cn.com"],
"insecure-registries": [],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
},
"max-concurrent-downloads": 10
}
EOF
  • 配置docker命令补全
1
2
cp /usr/share/bash-completion/completions/docker /etc/bash_completion.d/
source /etc/bash_completion.d/docker
  • 配置docker服务开机自启动
1
2
systemctl enable docker.service
systemctl start docker.service
  • 查看docker信息
1
docker info
  • 禁用docker源
1
2
# 为避免yum update时更新docker,将docker源禁用
sed -e 's,enabled=1,enabled=0,g' -i /etc/yum.repos.d/docker-ce.repo
  • 确保以最新的内核启动系统
1
reboot

定义集群变量

注意
  • 这里的变量只对当前会话生效,如果会话断开或者重启服务器,都需要重新定义变量
  • HostArray定义集群中所有节点的主机名和IP
  • MasterArray定义master节点的主机名和IP
  • WorkerArray定义worker节点的主机名和IP,这里master和worker都在一起,所以MasterArray和WorkerArray一样
  • VIP_IFACE定义keepalived的VIP绑定在哪一个网卡
  • ETCD_SERVERS以MasterArray的信息生成etcd集群服务器列表
  • ETCD_INITIAL_CLUSTER以MasterArray信息生成etcd集群初始化列表
  • POD_DNS_SERVER_IP定义Pod的DNS服务器IP地址
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
declare -A HostArray MasterArray WorkerArray
# 声明所有节点的信息
HostArray=(['k8s-m1']=172.16.80.201 ['k8s-m2']=172.16.80.202 ['k8s-m3']=172.16.80.203)
# 如果节点多,可以按照下面的方式声明Array
# HostArray=(['k8s-m1']=172.16.80.201 ['k8s-m2']=172.16.80.202 ['k8s-m3']=172.16.80.203 ['k8s-n1']=172.16.80.204 ['k8s-n2']=172.16.80.205)
# 声明master节点信息
MasterArray=(['k8s-m1']=172.16.80.201 ['k8s-m2']=172.16.80.202 ['k8s-m3']=172.16.80.203)
# 声明worker节点信息
WorkerArray=(['k8s-m1']=172.16.80.201 ['k8s-m2']=172.16.80.202 ['k8s-m3']=172.16.80.203)
#
VIP="172.16.80.200"
KUBE_APISERVER="https://172.16.80.200:8443"

# etcd版本号
# kubeadm-v1.11.5里面使用的是v3.2.18,这里直接上到最新的v3.3.10
ETCD_VERSION="v3.3.10"
# kubernetes版本号
KUBERNETES_VERSION="v1.11.5"
# cni-plugin版本号
# kubernetes YUM源里用的还是v0.6.0版,这里上到最新的v0.7.4
CNI_PLUGIN_VERSION="v0.7.4"

# 声明VIP所在的网卡名称,以ens33为例
VIP_IFACE="ens33"
# 声明etcd_server
ETCD_SERVERS=$( xargs -n1<<<${MasterArray[@]} | sort | sed 's#^#https://#;s#$#:2379#;$s#\n##' | paste -d, -s - )
ETCD_INITIAL_CLUSTER=$( for i in ${!MasterArray[@]};do echo $i=https://${MasterArray[$i]}:2380; done | sort | paste -d, -s - )

# 定义POD_CLUSTER_CIDR
POD_NET_CIDR="10.244.0.0/16"
# 定义SVC_CLUSTER_CIDR
SVC_CLUSTER_CIDR="10.96.0.0/12"
# 定义POD_DNS_SERVER_IP
POD_DNS_SERVER_IP="10.96.0.10"

下载所需软件包

  • 创建工作目录
1
2
mkdir -p /root/software
cd /root/software
  • 二进制文件需要分发到master和worker节点
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 下载kubernetes二进制包
echo "--- 下载kubernetes ${KUBERNETES_VERSION} 二进制包 ---"
wget https://dl.k8s.io/${KUBERNETES_VERSION}/kubernetes-server-linux-amd64.tar.gz
tar xzf kubernetes-server-linux-amd64.tar.gz \
kubernetes/server/bin/hyperkube \
kubernetes/server/bin/kube-controller-manager \
kubernetes/server/bin/kubectl \
kubernetes/server/bin/apiextensions-apiserver \
kubernetes/server/bin/kube-proxy \
kubernetes/server/bin/kube-apiserver \
kubernetes/server/bin/kubelet \
kubernetes/server/bin/kubeadm \
kubernetes/server/bin/kube-aggregator \
kubernetes/server/bin/kube-scheduler \
kubernetes/server/bin/cloud-controller-manager \
kubernetes/server/bin/mounter
chown -R root:root kubernetes/server/bin/*
chmod 0755 kubernetes/server/bin/*
# 这里需要先拷贝kubectl到/usr/local/bin目录下,用于生成kubeconfig文件
rsync -avpt kubernetes/server/bin/kubectl /usr/local/bin/kubectl

# 下载etcd二进制包
echo "--- 下载etcd ${ETCD_VERSION} 二进制包 ---"
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz
tar xzf etcd-${ETCD_VERSION}-linux-amd64.tar.gz \
etcd-${ETCD_VERSION}-linux-amd64/etcdctl \
etcd-${ETCD_VERSION}-linux-amd64/etcd
chown root:root etcd-${ETCD_VERSION}-linux-amd64/etcdctl etcd-${ETCD_VERSION}-linux-amd64/etcd
chmod 0755 etcd-${ETCD_VERSION}-linux-amd64/etcdctl etcd-${ETCD_VERSION}-linux-amd64/etcd

# 下载CNI-plugin
echo "--- 下载cni-plugins ${CNI_PLUGIN_VERSION} 二进制包 ---"
wget https://github.com/containernetworking/plugins/releases/download/${CNI_PLUGIN_VERSION}/cni-plugins-amd64-${CNI_PLUGIN_VERSION}.tgz
mkdir /root/software/cni-plugins
tar xzf cni-plugins-amd64-${CNI_PLUGIN_VERSION}.tgz -C /root/software/cni-plugins/

生成集群Key和Certificates

说明

本次部署,需要为etcd-serveretcd-clientkube-apiserverkube-controller-managerkube-schedulerkube-proxy生成证书。另外还需要生成safront-proxy-cafront-proxy-client证书用于集群的其他功能。

  • 要注意CA JSON文件的CN(Common Name)O(Organization)等内容是会影响Kubernetes组件认证的。
    • CN Common Name,kube-apiserver会从证书中提取该字段作为请求的用户名(User Name)
    • O Oragnization,kube-apiserver会从证书中提取该字段作为请求用户的所属组(Group)
  • CA是自签名根证书,用来给后续各种证书签名
  • kubernetes集群的所有状态信息都保存在etcd中,kubernetes组件会通过kube-apiserver读写etcd里面的信息
  • etcd如果暴露在公网且没做SSL/TLS验证,那么任何人都能读写数据,那么很可能会无端端在kubernetes集群里面多了挖坑Pod或者肉鸡Pod
  • 本文使用CFSSL创建证书,证书有效期10年
  • 建立证书过程在k8s-m1上完成

下载CFSSL工具

1
2
3
4
5
6
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 -O /usr/local/bin/cfssl-certinfo
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -O /usr/local/bin/cfssl
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -O /usr/local/bin/cfssljson
chmod 755 /usr/local/bin/cfssl-certinfo \
/usr/local/bin/cfssl \
/usr/local/bin/cfssljson

创建工作目录

1
2
mkdir -p /root/pki /root/master /root/worker
cd /root/pki

创建用于生成证书的json文件

ca-config.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
},
"etcd-server": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
},
"etcd-client": {
"usages": [
"signing",
"key encipherment",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF

ca-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > ca-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "Kubernetes",
"OU": "System"
}
]
}
EOF

etcd-ca-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > etcd-ca-csr.json <<EOF
{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "etcd",
"OU": "Etcd Security"
}
]
}
EOF

etcd-server-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > etcd-server-csr.json <<EOF
{
"CN": "etcd-server",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "etcd",
"OU": "Etcd Security"
}
]
}
EOF

etcd-client-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat > etcd-client-csr.json <<EOF
{
"CN": "etcd-client",
"key": {
"algo": "rsa",
"size": 2048
},
"hosts": [
""
],
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "etcd",
"OU": "Etcd Security"
}
]
}
EOF

kube-apiserver-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > kube-apiserver-csr.json <<EOF
{
"CN": "kube-apiserver",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "Kubernetes",
"OU": "System"
}
]
}
EOF

kube-manager-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > kube-manager-csr.json <<EOF
{
"CN": "system:kube-controller-manager",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "system:kube-controller-manager",
"OU": "System"
}
]
}
EOF

kube-scheduler-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > kube-scheduler-csr.json <<EOF
{
"CN": "system:kube-scheduler",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "system:kube-scheduler",
"OU": "System"
}
]
}
EOF

kube-proxy-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > kube-proxy-csr.json <<EOF
{
"CN": "system:kube-proxy",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "system:kube-proxy",
"OU": "System"
}
]
}
EOF

kube-admin-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > kube-admin-csr.json <<EOF
{
"CN": "admin",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "system:masters",
"OU": "System"
}
]
}
EOF

front-proxy-ca-csr.json

1
2
3
4
5
6
7
8
9
cat > front-proxy-ca-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
}
}
EOF

front-proxy-client-csr.json

1
2
3
4
5
6
7
8
9
cat > front-proxy-client-csr.json <<EOF
{
"CN": "front-proxy-client",
"key": {
"algo": "rsa",
"size": 2048
}
}
EOF

sa-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > sa-csr.json <<EOF
{
"CN": "service-accounts",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "Guangdong",
"L": "Guangzhou",
"O": "Kubernetes",
"OU": "System"
}
]
}
EOF

创建etcd证书

etcd-ca证书

1
2
echo '--- 创建etcd-ca证书 ---'
cfssl gencert -initca etcd-ca-csr.json | cfssljson -bare etcd-ca

etcd-server证书

1
2
3
4
5
6
7
echo '--- 创建etcd-server证书 ---'
cfssl gencert \
-ca=etcd-ca.pem \
-ca-key=etcd-ca-key.pem \
-config=ca-config.json \
-hostname=127.0.0.1,$(xargs -n1<<<${MasterArray[@]} | sort | paste -d, -s -) \
-profile=etcd-server etcd-server-csr.json | cfssljson -bare etcd-server

etcd-client证书

1
2
3
4
5
6
echo '--- 创建etcd-client证书 ---'
cfssl gencert \
-ca=etcd-ca.pem \
-ca-key=etcd-ca-key.pem \
-config=ca-config.json \
-profile=etcd-client etcd-client-csr.json | cfssljson -bare etcd-client

创建kubernetes证书

kubernetes-CA 证书

1
2
3
echo '--- 创建kubernetes-ca证书 ---'
# 创建kubernetes-ca证书
cfssl gencert -initca ca-csr.json | cfssljson -bare kube-ca

kube-apiserver证书

1
2
3
4
5
6
7
8
9
10
echo '--- 创建kube-apiserver证书 ---'
# 创建kube-apiserver证书
# 这里的hostname字段中的10.96.0.1要跟上文提到的service cluster ip cidr对应
cfssl gencert \
-ca=kube-ca.pem \
-ca-key=kube-ca-key.pem \
-config=ca-config.json \
-hostname=10.96.0.1,127.0.0.1,localhost,kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster,kubernetes.default.svc.cluster.local,${VIP},$(xargs -n1<<<${MasterArray[@]} | sort | paste -d, -s -) \
-profile=kubernetes \
kube-apiserver-csr.json | cfssljson -bare kube-apiserver

kube-controller-manager证书

1
2
3
4
5
6
7
8
echo '--- 创建kube-controller-manager证书 ---'
# 创建kube-controller-manager证书
cfssl gencert \
-ca=kube-ca.pem \
-ca-key=kube-ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
kube-manager-csr.json | cfssljson -bare kube-controller-manager

kube-scheduler证书

1
2
3
4
5
6
7
8
echo '--- 创建kube-scheduler证书 ---'
# 创建kube-scheduler证书
cfssl gencert \
-ca=kube-ca.pem \
-ca-key=kube-ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
kube-scheduler-csr.json | cfssljson -bare kube-scheduler

kube-proxy证书

1
2
3
4
5
6
7
8
echo '--- 创建kube-proxy证书 ---'
# 创建kube-proxy证书
cfssl gencert \
-ca=kube-ca.pem \
-ca-key=kube-ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
kube-proxy-csr.json | cfssljson -bare kube-proxy

kube-admin证书

1
2
3
4
5
6
7
8
echo '--- 创建kube-admin证书 ---'
# 创建kube-admin证书
cfssl gencert \
-ca=kube-ca.pem \
-ca-key=kube-ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
kube-admin-csr.json | cfssljson -bare kube-admin

Front Proxy证书

1
2
3
4
5
6
7
8
9
echo '--- 创建Front Proxy Certificate证书 ---'
# 创建Front Proxy Certificate证书
cfssl gencert -initca front-proxy-ca-csr.json | cfssljson -bare front-proxy-ca
cfssl gencert \
-ca=front-proxy-ca.pem \
-ca-key=front-proxy-ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
front-proxy-client-csr.json | cfssljson -bare front-proxy-client

Service Account证书

1
2
3
4
5
6
7
8
echo '--- 创建service account证书 ---'
# 创建创建service account证书
cfssl gencert \
-ca=kube-ca.pem \
-ca-key=kube-ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
sa-csr.json | cfssljson -bare sa

bootstrap-token

1
2
3
4
5
6
7
BOOTSTRAP_TOKEN=$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64 | tr -d "=+/[:space:]" | dd bs=32 count=1 2>/dev/null)
echo "BOOTSTRAP_TOKEN: ${BOOTSTRAP_TOKEN}"

# 创建token.csv文件
cat > token.csv <<EOF
${BOOTSTRAP_TOKEN},kubelet-bootstrap,10001,"system:bootstrappers"
EOF

encryption.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ENCRYPTION_TOKEN=$(head -c 32 /dev/urandom | base64)
echo "ENCRYPTION_TOKEN: ${ENCRYPTION_TOKEN}"

# 创建encryption.yaml文件
cat > encryption.yaml <<EOF
kind: EncryptionConfig
apiVersion: v1
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_TOKEN}
- identity: {}
EOF

audit-policy.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
echo '--- 创建创建高级审计配置 ---'
# 创建高级审计配置
cat >> audit-policy.yaml <<EOF
apiVersion: audit.k8s.io/v1beta1
kind: Policy
rules:
# The following requests were manually identified as high-volume and low-risk,
# so drop them.
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core
resources: ["endpoints", "services", "services/status"]
- level: None
# Ingress controller reads 'configmaps/ingress-uid' through the unsecured port.
# TODO(#46983): Change this to the ingress controller service account.
users: ["system:unsecured"]
namespaces: ["kube-system"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"] # legacy kubelet identity
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
- level: None
users: ["system:apiserver"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["namespaces", "namespaces/status", "namespaces/finalize"]
- level: None
users: ["cluster-autoscaler"]
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["configmaps", "endpoints"]
# Don't log HPA fetching metrics.
- level: None
users:
- system:kube-controller-manager
verbs: ["get", "list"]
resources:
- group: "metrics.k8s.io"

# Don't log these read-only URLs.
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*

# Don't log events requests.
- level: None
resources:
- group: "" # core
resources: ["events"]

# node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes
- level: Request
users: ["kubelet", "system:node-problem-detector", "system:serviceaccount:kube-system:node-problem-detector"]
verbs: ["update","patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"
- level: Request
userGroups: ["system:nodes"]
verbs: ["update","patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"

# deletecollection calls can be large, don't log responses for expected namespace deletions
- level: Request
users: ["system:serviceaccount:kube-system:namespace-controller"]
verbs: ["deletecollection"]
omitStages:
- "RequestReceived"

# Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps"]
- group: authentication.k8s.io
resources: ["tokenreviews"]
omitStages:
- "RequestReceived"
# Get repsonses can be large; skip them.
- level: Request
verbs: ["get", "list", "watch"]
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apiextensions.k8s.io"
- group: "apiregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "metrics.k8s.io"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "scheduling.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
omitStages:
- "RequestReceived"
# Default level for known APIs
- level: RequestResponse
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apiextensions.k8s.io"
- group: "apiregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "metrics.k8s.io"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "scheduling.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
omitStages:
- "RequestReceived"
# Default level for all other requests.
- level: Metadata
omitStages:
- "RequestReceived"
EOF

创建kubeconfig文件

说明

  • kubeconfig 文件用于组织关于集群、用户、命名空间和认证机制的信息。
  • 命令行工具 kubectl 从 kubeconfig 文件中得到它要选择的集群以及跟集群 API server 交互的信息。
  • 默认情况下,kubectl 会从 $HOME/.kube 目录下查找文件名为 config 的文件。

注意: 用于配置集群访问信息的文件叫作 kubeconfig文件,这是一种引用配置文件的通用方式,并不是说它的文件名就是 kubeconfig

kube-controller-manager.kubeconfig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
echo "Create kube-controller-manager kubeconfig..."
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=kube-ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-controller-manager.kubeconfig
# 设置客户端认证参数
kubectl config set-credentials system:kube-controller-manager \
--client-certificate=kube-controller-manager.pem \
--client-key=kube-controller-manager-key.pem \
--embed-certs=true \
--kubeconfig=kube-controller-manager.kubeconfig
# 设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=system:kube-controller-manager \
--kubeconfig=kube-controller-manager.kubeconfig
# 设置默认上下文
kubectl config use-context default --kubeconfig=kube-controller-manager.kubeconfig

kube-scheduler.kubeconfig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
echo "Create kube-scheduler kubeconfig..."
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=kube-ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-scheduler.kubeconfig
# 设置客户端认证参数
kubectl config set-credentials system:kube-scheduler \
--client-certificate=kube-scheduler.pem \
--client-key=kube-scheduler-key.pem \
--embed-certs=true \
--kubeconfig=kube-scheduler.kubeconfig
# 设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=system:kube-scheduler \
--kubeconfig=kube-scheduler.kubeconfig
# 设置默认上下文
kubectl config use-context default --kubeconfig=kube-scheduler.kubeconfig

kube-proxy.kubeconfig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
echo "Create kube-proxy kubeconfig..."
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=kube-ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-proxy.kubeconfig
# 设置客户端认证参数
kubectl config set-credentials system:kube-proxy \
--client-certificate=kube-proxy.pem \
--client-key=kube-proxy-key.pem \
--embed-certs=true \
--kubeconfig=kube-proxy.kubeconfig
# 设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=system:kube-proxy \
--kubeconfig=kube-proxy.kubeconfig
# 设置默认上下文
kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig

kube-admin.kubeconfig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
echo "Create kube-admin kubeconfig..."
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=kube-ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-admin.kubeconfig
# 设置客户端认证参数
kubectl config set-credentials kubernetes-admin \
--client-certificate=kube-admin.pem \
--client-key=kube-admin-key.pem \
--embed-certs=true \
--kubeconfig=kube-admin.kubeconfig
# 设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=kubernetes-admin \
--kubeconfig=kube-admin.kubeconfig
# 设置默认上下文
kubectl config use-context default --kubeconfig=kube-admin.kubeconfig

bootstrap.kubeconfig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
echo "Create kubelet bootstrapping kubeconfig..."
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=kube-ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=bootstrap.kubeconfig
# 设置客户端认证参数
kubectl config set-credentials kubelet-bootstrap \
--token=${BOOTSTRAP_TOKEN} \
--kubeconfig=bootstrap.kubeconfig
# 设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=kubelet-bootstrap \
--kubeconfig=bootstrap.kubeconfig
# 设置默认上下文
kubectl config use-context default --kubeconfig=bootstrap.kubeconfig

清理证书CSR文件

1
2
echo '--- 删除*.csr文件 ---'
rm -rf *csr

修改文件权限

1
2
3
chown root:root *pem *kubeconfig *yaml *csv
chmod 0444 *pem *kubeconfig *yaml *csv
chmod 0400 *key.pem

检查生成的文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
ls -l | grep -v json
-r--r--r-- 1 root root 113 Dec 6 15:36 audit-policy.yaml
-r--r--r-- 1 root root 2207 Dec 6 15:36 bootstrap.kubeconfig
-r--r--r-- 1 root root 240 Dec 6 15:36 encryption.yaml
-r-------- 1 root root 1675 Dec 6 15:36 etcd-ca-key.pem
-r--r--r-- 1 root root 1375 Dec 6 15:36 etcd-ca.pem
-r-------- 1 root root 1679 Dec 6 15:36 etcd-client-key.pem
-r--r--r-- 1 root root 1424 Dec 6 15:36 etcd-client.pem
-r-------- 1 root root 1679 Dec 6 15:36 etcd-server-key.pem
-r--r--r-- 1 root root 1468 Dec 6 15:36 etcd-server.pem
-r-------- 1 root root 1679 Dec 6 15:36 front-proxy-ca-key.pem
-r--r--r-- 1 root root 1143 Dec 6 15:36 front-proxy-ca.pem
-r-------- 1 root root 1675 Dec 6 15:36 front-proxy-client-key.pem
-r--r--r-- 1 root root 1188 Dec 6 15:36 front-proxy-client.pem
-r-------- 1 root root 1679 Dec 6 15:36 kube-admin-key.pem
-r--r--r-- 1 root root 6345 Dec 6 15:36 kube-admin.kubeconfig
-r--r--r-- 1 root root 1419 Dec 6 15:36 kube-admin.pem
-r-------- 1 root root 1675 Dec 6 15:36 kube-apiserver-key.pem
-r--r--r-- 1 root root 1688 Dec 6 15:36 kube-apiserver.pem
-r-------- 1 root root 1679 Dec 6 15:36 kube-ca-key.pem
-r--r--r-- 1 root root 1387 Dec 6 15:36 kube-ca.pem
-r-------- 1 root root 1679 Dec 6 15:36 kube-controller-manager-key.pem
-r--r--r-- 1 root root 6449 Dec 6 15:36 kube-controller-manager.kubeconfig
-r--r--r-- 1 root root 1476 Dec 6 15:36 kube-controller-manager.pem
-r-------- 1 root root 1675 Dec 6 15:36 kube-proxy-key.pem
-r--r--r-- 1 root root 6371 Dec 6 15:36 kube-proxy.kubeconfig
-r--r--r-- 1 root root 1440 Dec 6 15:36 kube-proxy.pem
-r-------- 1 root root 1675 Dec 6 15:36 kube-scheduler-key.pem
-r--r--r-- 1 root root 6395 Dec 6 15:36 kube-scheduler.kubeconfig
-r--r--r-- 1 root root 1452 Dec 6 15:36 kube-scheduler.pem
-r-------- 1 root root 1675 Dec 6 15:36 sa-key.pem
-r--r--r-- 1 root root 1432 Dec 6 15:36 sa.pem
-r--r--r-- 1 root root 80 Dec 6 15:36 token.csv

kubernetes-master节点

本节介绍如何部署kubernetes master节点

master节点说明

  • 原则上,master节点不应该运行业务Pod,且不应该暴露到公网环境!!
  • 边界节点,应该交由worker节点或者运行Ingress的节点来承担
  • kubeadm部署为例,部署完成后,会给master节点添加node-role.kubernetes.io/master=''标签(Labels)并且会对带有此标签的节点添加node-role.kubernetes.io/master:NoSchedule污点(taints),这样不能容忍此污点的Pod无法调度到master节点
  • 本文中,在kubelet启动参数里,默认添加node-role.kubernetes.io/node=''标签(Labels),且没有对master节点添加node-role.kubernetes.io/master:NoSchedule污点(taints)
  • 生产环境中最好参照kubeadm,对master节点添加node-role.kubernetes.io/master=''标签(Labels)和node-role.kubernetes.io/master:NoSchedule污点(taints)

kube-apiserver

  • 以 REST APIs 提供 Kubernetes 资源的 CRUD,如授权、认证、存取控制与 API 注册等机制。
  • 关闭默认非安全端口8080,在安全端口 6443 接收 https 请求
  • 严格的认证和授权策略 (x509、token、RBAC)
  • 开启 bootstrap token 认证,支持 kubelet TLS bootstrapping
  • 使用 https 访问 kubelet、etcd,加密通信

kube-controller-manager

  • 通过核心控制循环(Core Control Loop)监听 Kubernetes API
    的资源来维护集群的状态,这些资源会被不同的控制器所管理,如 Replication Controller、Namespace
    Controller 等等。而这些控制器会处理着自动扩展、滚动更新等等功能。
  • 关闭非安全端口,在安全端口 10252 接收 https 请求
  • 使用 kubeconfig 访问 kube-apiserver 的安全端口

kube-scheduler

  • 负责将一个(或多个)容器依据调度策略分配到对应节点上让容器引擎(如 Docker)执行。
  • 调度受到 QoS 要求、软硬性约束、亲和性(Affinity)等等因素影响。

HAProxy

  • 提供多个 API Server 的负载均衡(Load Balance)
  • 监听VIP的8443端口负载均衡到三台master节点的6443端口

Keepalived

  • 提供虚拟IP位址(VIP),来让vip落在可用的master主机上供所有组件访问master节点
  • 提供健康检查脚本用于切换VIP

添加用户

  • 这里强迫症发作,指定了UIDGID
  • 不指定UIDGID也可以
1
2
3
4
5
6
7
8
echo '--- master节点添加用户 ---'
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ---"
ssh ${NODE} /usr/sbin/groupadd -r -g 10000 kube
ssh ${NODE} /usr/sbin/groupadd -r -g 10001 etcd
ssh ${NODE} /usr/sbin/useradd -r -g kube -u 10000 -s /bin/false kube
ssh ${NODE} /usr/sbin/useradd -r -g etcd -u 10001 -s /bin/false etcd
done

创建目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
echo '--- master节点创建目录 ---'
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ---"
echo "--- 创建目录 ---"
ssh ${NODE} /usr/bin/mkdir -p /etc/etcd/ssl \
/etc/kubernetes/pki \
/etc/kubernetes/manifests \
/var/lib/etcd \
/var/lib/kubelet \
/var/run/kubernetes \
/var/log/kube-audit \
/etc/cni/net.d \
/opt/cni/bin
echo "--- 修改目录权限 ---"
ssh ${NODE} /usr/bin/chmod 0755 /etc/etcd \
/etc/etcd/ssl \
/etc/kubernetes \
/etc/kubernetes/pki \
/var/lib/etcd \
/var/lib/kubelet \
/var/log/kube-audit \
/var/run/kubernetes \
/etc/cni/net.d \
/opt/cni/bin
echo "--- 修改目录属组 ---"
ssh ${NODE} chown -R etcd:etcd /etc/etcd/ /var/lib/etcd
ssh ${NODE} chown -R kube:kube /etc/kubernetes \
/var/lib/kubelet \
/var/log/kube-audit \
/var/run/kubernetes
done

分发证书文件和kubeconfig到master节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
for NODE in "${!MasterArray[@]}";do
echo "---- $NODE ----"
echo '---- 分发etcd证书 ----'
rsync -avpt /root/pki/etcd-ca-key.pem \
/root/pki/etcd-ca.pem \
/root/pki/etcd-client-key.pem \
/root/pki/etcd-client.pem \
/root/pki/etcd-server-key.pem \
/root/pki/etcd-server.pem \
$NODE:/etc/etcd/ssl/
echo '---- 分发kubeconfig文件 yaml文件 token.csv ----'
rsync -avpt /root/pki/kube-admin.kubeconfig \
/root/pki/kube-controller-manager.kubeconfig \
/root/pki/kube-scheduler.kubeconfig \
/root/pki/audit-policy.yaml \
/root/pki/encryption.yaml \
/root/pki/token.csv \
$NODE:/etc/kubernetes/
echo '---- 分发sa证书 kube证书 front-proxy证书 ----'
rsync -avpt /root/pki/etcd-ca.pem \
/root/pki/etcd-client-key.pem \
/root/pki/etcd-client.pem \
/root/pki/front-proxy-ca.pem \
/root/pki/front-proxy-client-key.pem \
/root/pki/front-proxy-client.pem \
/root/pki/kube-apiserver-key.pem \
/root/pki/kube-apiserver.pem \
/root/pki/kube-ca.pem \
/root/pki/kube-ca-key.pem \
/root/pki/sa-key.pem \
/root/pki/sa.pem \
$NODE:/etc/kubernetes/pki/
ssh $NODE chown -R etcd:etcd /etc/etcd
ssh $NODE chown -R kube:kube /etc/kubernetes
done

分发二进制文件

  • k8s-m1上操作
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
echo '--- 分发kubernetes和etcd二进制文件 ---'
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ---"
rsync -avpt /root/software/kubernetes/server/bin/hyperkube \
/root/software/kubernetes/server/bin/kube-controller-manager \
/root/software/kubernetes/server/bin/kubectl \
/root/software/kubernetes/server/bin/apiextensions-apiserver \
/root/software/kubernetes/server/bin/kube-apiserver \
/root/software/kubernetes/server/bin/kubeadm \
/root/software/kubernetes/server/bin/kube-aggregator \
/root/software/kubernetes/server/bin/kube-scheduler \
/root/software/kubernetes/server/bin/cloud-controller-manager \
/root/software/kubernetes/server/bin/mounter \
/root/software/etcd-${ETCD_VERSION}-linux-amd64/etcdctl \
/root/software/etcd-${ETCD_VERSION}-linux-amd64/etcd \
$NODE:/usr/local/bin/
done

部署配置Keepalived和HAProxy

  • k8s-m1上操作

切换工作目录

1
cd /root/master

安装Keepalived和HAProxy

1
2
3
4
5
for NODE in "${!MasterArray[@]}";do
echo "---- $NODE ----"
echo "---- 安装haproxy和keepalived ----"
ssh $NODE yum install keepalived haproxy -y
done

配置keepalived

  • 编辑keepalived.conf模板
  • 替换keepalived.conf的字符串
  • 编辑check_haproxy.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
cat > keepalived.conf.example <<EOF
vrrp_script haproxy-check {
script "/bin/bash /etc/keepalived/check_haproxy.sh"
interval 3
weight -2
fall 10
rise 2
}

vrrp_instance haproxy-vip {
state BACKUP
priority 101
interface {{ VIP_IFACE }}
virtual_router_id 47
advert_int 3

unicast_peer {
}

virtual_ipaddress {
{{ VIP }}
}

track_script {
haproxy-check
}
}
EOF


# 替换字符
sed -r -e "s#\{\{ VIP \}\}#${VIP}#" \
-e "s#\{\{ VIP_IFACE \}\}#${VIP_IFACE}#" \
-e '/unicast_peer/r '<(xargs -n1<<<${MasterArray[@]} | sort | sed 's#^#\t#') \
keepalived.conf.example > keepalived.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
cat > check_haproxy.sh <<EOF
#!/bin/bash
VIRTUAL_IP=${VIP}

errorExit() {
echo "*** $*" 1>&2
exit 1
}

if ip addr | grep -q \$VIRTUAL_IP ; then
curl -s --max-time 2 --insecure https://\${VIRTUAL_IP}:8443/ -o /dev/null || errorExit "Error GET https://\${VIRTUAL_IP}:8443/"
fi
EOF

配置haproxy

  • 编辑haproxy.cfg模板
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
cat > haproxy.cfg.example <<EOF
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s

defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s

frontend monitor-in
bind ${VIP}:33305
mode http
option httplog
monitor-uri /monitor

listen stats
bind ${VIP}:8006
mode http
stats enable
stats hide-version
stats uri /stats
stats refresh 30s
stats realm Haproxy\ Statistics
stats auth admin:admin

frontend k8s-api
bind ${VIP}:8443
mode tcp
option tcplog
tcp-request inspect-delay 5s
default_backend k8s-api

backend k8s-api
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
EOF


# 替换字符
sed -e '$r '<(paste <( seq -f' server k8s-api-%g' ${#MasterArray[@]} ) <( xargs -n1<<<${MasterArray[@]} | sort | sed 's#$#:6443 check#')) haproxy.cfg.example > haproxy.cfg

分发配置文件到master节点

1
2
3
4
5
6
7
for NODE in "${!MasterArray[@]}";do
echo "---- $NODE ----"
rsync -avpt haproxy.cfg $NODE:/etc/haproxy/
rsync -avpt keepalived.conf \
check_haproxy.sh \
$NODE:/etc/keepalived/
done

启动keepalived和haproxy

1
2
3
4
for NODE in "${!MasterArray[@]}";do
echo "---- $NODE ----"
ssh $NODE systemctl enable --now keepalived haproxy
done

验证VIP

  • 需要大约十秒的时间等待keepalived和haproxy服务起来
  • 这里由于后端的kube-apiserver服务还没启动,只测试是否能ping通VIP
  • 如果VIP没起来,就要去确认一下各master节点的keepalived服务是否正常
1
2
sleep 15
ping -c 4 $VIP

部署etcd集群

  • 每个etcd节点的配置都需要做对应更改
  • k8s-m1上操作

配置etcd.service文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > etcd.service <<EOF
[Unit]
Description=Etcd Service
Documentation=https://coreos.com/etcd/docs/latest/
After=network.target

[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd.config.yaml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
Alias=etcd3.service
EOF

etcd.config.yaml模板

  • 关于各个参数的说明可以看这里
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
cat > etcd.config.yaml.example <<EOF
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: '{HOSTNAME}'
# Path to the data directory.
data-dir: '/var/lib/etcd/{HOSTNAME}.data/'
# Path to the dedicated wal directory.
wal-dir: '/var/lib/etcd/{HOSTNAME}.wal/'
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 5000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: 'https://{PUBLIC_IP}:2380'
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: 'https://{PUBLIC_IP}:2379,http://127.0.0.1:2379'
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 3
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: 'https://{PUBLIC_IP}:2380'
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: 'https://{PUBLIC_IP}:2379'
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: '${ETCD_INITIAL_CLUSTER}'
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-k8s-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# Path to the client server TLS cert file.
cert-file: '/etc/etcd/ssl/etcd-server.pem'
# Path to the client server TLS key file.
key-file: '/etc/etcd/ssl/etcd-server-key.pem'
# Enable client cert authentication.
client-cert-auth: true
# Path to the client server TLS trusted CA cert file.
trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem'
# Client TLS using generated certificates
auto-tls: true
peer-transport-security:
# Path to the peer server TLS cert file.
cert-file: '/etc/etcd/ssl/etcd-server.pem'
# Path to the peer server TLS key file.
key-file: '/etc/etcd/ssl/etcd-server-key.pem'
# Enable peer client cert authentication.
client-cert-auth: true
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file: '/etc/etcd/ssl/etcd-ca.pem'
# Peer TLS using generated certificates.
auto-tls: true
# Enable debug-level logging for etcd.
debug: false
logger: 'zap'
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [default]
# Force to create a new one member cluster.
force-new-cluster: false
auto-compaction-mode: 'periodic'
auto-compaction-retention: '1'
# Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
# default is 'basic'
metrics: 'basic'
EOF

分发配置文件

1
2
3
4
5
6
7
8
9
10
11
12
# 根据节点信息替换文本,分发到各etcd节点
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ${MasterArray[$NODE]} ---"
sed -e "s/{HOSTNAME}/$NODE/g" \
-e "s/{PUBLIC_IP}/${MasterArray[$NODE]}/g" \
etcd.config.yaml.example > etcd.config.yaml.${NODE}
rsync -avpt etcd.config.yaml.${NODE} ${NODE}:/etc/etcd/etcd.config.yaml
rsync -avpt etcd.service ${NODE}:/usr/lib/systemd/system/etcd.service
ssh ${NODE} systemctl daemon-reload
ssh ${NODE} chown -R etcd:etcd /etc/etcd
rm -rf etcd.config.yaml.${NODE}
done

启动etcd集群

  • etcd 进程首次启动时会等待其它节点的 etcd 加入集群,命令 systemctl start etcd 会卡住一段时间,为正常现象
  • 启动之后可以通过etcdctl命令查看集群状态
1
2
3
4
5
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ${MasterArray[$NODE]} ---"
ssh $NODE systemctl enable etcd
ssh $NODE systemctl start etcd &
done
  • 为方便维护,可使用alias简化etcdctl命令
1
2
3
4
cat >> /root/.bashrc <<EOF
alias etcdctl2="export ETCDCTL_API=2;etcdctl --ca-file '/etc/etcd/ssl/etcd-ca.pem' --cert-file '/etc/etcd/ssl/etcd-client.pem' --key-file '/etc/etcd/ssl/etcd-client-key.pem' --endpoints ${ETCD_SERVERS}"
alias etcdctl3="export ETCDCTL_API=3;etcdctl --cacert=/etc/etcd/ssl/etcd-ca.pem --cert=/etc/etcd/ssl/etcd-client.pem --key=/etc/etcd/ssl/etcd-client-key.pem --endpoints=${ETCD_SERVERS}"
EOF

验证etcd集群状态

  • etcd提供v2和v3两套API,kubernetes使用v3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 应用上面定义的alias
source /root/.bashrc
# 使用v2 API访问etcd的集群状态
etcdctl2 cluster-health
# 示例输出
member 222fd3b0bb4a5931 is healthy: got healthy result from https://172.16.80.203:2379
member 8349ef180b115a83 is healthy: got healthy result from https://172.16.80.201:2379
member f525d2d797a7c465 is healthy: got healthy result from https://172.16.80.202:2379
cluster is healthy
# 使用v2 API访问etcd成员列表
etcdctl2 member list
# 示例输出
222fd3b0bb4a5931: name=k8s-m3 peerURLs=https://172.16.80.203:2380 clientURLs=https://172.16.80.203:2379 isLeader=false
8349ef180b115a83: name=k8s-m1 peerURLs=https://172.16.80.201:2380 clientURLs=https://172.16.80.201:2379 isLeader=false
f525d2d797a7c465: name=k8s-m2 peerURLs=https://172.16.80.202:2380 clientURLs=https://172.16.80.202:2379 isLeader=true


# 使用v3 API访问etcd的endpoint状态
etcdctl3 endpoint health
# 示例输出
https://172.16.80.201:2379 is healthy: successfully committed proposal: took = 2.879402ms
https://172.16.80.203:2379 is healthy: successfully committed proposal: took = 6.708566ms
https://172.16.80.202:2379 is healthy: successfully committed proposal: took = 7.187607ms
# 使用v3 API访问etcd成员列表
etcdctl3 member list --write-out=table
# 示例输出
+------------------+---------+--------+----------------------------+----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+--------+----------------------------+----------------------------+
| 222fd3b0bb4a5931 | started | k8s-m3 | https://172.16.80.203:2380 | https://172.16.80.203:2379 |
| 8349ef180b115a83 | started | k8s-m1 | https://172.16.80.201:2380 | https://172.16.80.201:2379 |
| f525d2d797a7c465 | started | k8s-m2 | https://172.16.80.202:2380 | https://172.16.80.202:2379 |
+------------------+---------+--------+----------------------------+----------------------------+

Master组件服务

master组件配置模板

kube-apiserver.conf

  • --allow-privileged=true启用容器特权模式

  • --apiserver-count=3指定集群运行模式,其它节点处于阻塞状态

  • --audit-policy-file=/etc/kubernetes/audit-policy.yaml 基于audit-policy.yaml文件定义的内容启动审计功能

  • --authorization-mode=Node,RBAC开启 Node 和 RBAC 授权模式,拒绝未授权的请求

  • --disable-admission-plugins=--enable-admission-plugins禁用和启用准入控制插件。

准入控制插件会在请求通过认证和授权之后、对象被持久化之前拦截到达apiserver的请求。

准入控制插件依次执行,因此需要注意顺序。

如果插件序列中任何一个拒绝了请求,则整个请求将立刻被拒绝并返回错误给客户端。

关于admission-plugins官方文档里面有推荐配置,这里直接采用官方配置,注意针对不同kubernetes版本都会有不一样的配置,具体可以看这里

  • --enable-bootstrap-token-auth=true启用 kubelet bootstrap 的 token 认证
  • --experimental-encryption-provider-config=/etc/kubernetes/encryption.yaml启用加密特性将Secret数据加密存储到etcd
  • --insecure-port=0关闭监听非安全端口8080
  • --runtime-config=api/all=true启用所有版本的 APIs
  • --service-cluster-ip-range=10.96.0.0/12指定 Service Cluster IP 地址段
  • --service-node-port-range=30000-32767指定 NodePort 的端口范围
  • --token-auth-file=/etc/kubernetes/token.csv保存bootstrap的token信息
  • --target-ram-mb配置缓存大小,参考值为节点数*60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
cat > kube-apiserver.conf.example <<EOF
KUBE_APISERVER_ARGS=" \\
--advertise-address={PUBLIC_IP} \\
--allow-privileged=true \\
--apiserver-count=3 \\
--audit-log-maxage=30 \\
--audit-log-maxbackup=3 \\
--audit-log-maxsize=1000 \\
--audit-log-path=/var/log/kube-audit/audit.log \\
--audit-policy-file=/etc/kubernetes/audit-policy.yaml \\
--authorization-mode=Node,RBAC \\
--bind-address=0.0.0.0 \\
--client-ca-file=/etc/kubernetes/pki/kube-ca.pem \\
--disable-admission-plugins=PersistentVolumeLabel \\
--enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,PodPreset \\
--enable-aggregator-routing=true \\
--enable-bootstrap-token-auth=true \\
--enable-garbage-collector=true \\
--etcd-compaction-interval=1h \\
--etcd-cafile=/etc/kubernetes/pki/etcd-ca.pem \\
--etcd-certfile=/etc/kubernetes/pki/etcd-client.pem \\
--etcd-keyfile=/etc/kubernetes/pki/etcd-client-key.pem \\
--etcd-servers=$ETCD_SERVERS \\
--experimental-encryption-provider-config=/etc/kubernetes/encryption.yaml \\
--event-ttl=1h \\
--feature-gates=PodShareProcessNamespace=true,ExpandPersistentVolumes=true \\
--insecure-port=0 \\
--kubelet-client-certificate=/etc/kubernetes/pki/kube-apiserver.pem \\
--kubelet-client-key=/etc/kubernetes/pki/kube-apiserver-key.pem \\
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \\
--logtostderr=true \\
--max-mutating-requests-inflight=500 \\
--max-requests-inflight=1500 \\
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem \\
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem \\
--requestheader-allowed-names=aggregator \\
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem \\
--requestheader-extra-headers-prefix=X-Remote-Extra- \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--runtime-config=api/all=true \\
--secure-port=6443 \\
--service-account-key-file=/etc/kubernetes/pki/sa.pem \\
--service-cluster-ip-range=10.96.0.0/12 \\
--service-node-port-range=30000-32767 \\
--storage-backend=etcd3 \\
--target-ram-mb=300 \\
--tls-cert-file=/etc/kubernetes/pki/kube-apiserver.pem \\
--tls-private-key-file=/etc/kubernetes/pki/kube-apiserver-key.pem \\
--token-auth-file=/etc/kubernetes/token.csv \\
--v=2 \\
"
EOF

kube-controller-manager.conf

  • --allocate-node-cidrs=true在cloud provider上分配和设置pod的CIDR
  • --cluster-cidr集群内的pod的CIDR范围,需要 --allocate-node-cidrs设为true
  • --experimental-cluster-signing-duration=8670h0m0s指定 TLS Bootstrap 证书的有效期
  • --feature-gates=RotateKubeletServerCertificate=true开启 kublet server 证书的自动更新特性
  • --horizontal-pod-autoscaler-use-rest-clients=true能够使用自定义资源(Custom Metrics)进行自动水平扩展
  • --leader-elect=true集群运行模式,启用选举功能,被选为 leader 的节点负责处理工作,其它节点为阻塞状态
  • --node-cidr-mask-size=24集群中node cidr的掩码
  • --service-cluster-ip-range=10.96.0.0/16指定 Service Cluster IP 网段,必须和 kube-apiserver 中的同名参数一致
  • --terminated-pod-gc-thresholdexit状态的pod超过多少会触发gc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cat > kube-controller-manager.conf.example <<EOF
KUBE_CONTROLLER_MANAGER_ARGS=" \\
--address=0.0.0.0 \\
--allocate-node-cidrs=true \\
--cluster-cidr=$POD_NET_CIDR \\
--cluster-signing-cert-file=/etc/kubernetes/pki/kube-ca.pem \\
--cluster-signing-key-file=/etc/kubernetes/pki/kube-ca-key.pem \\
--concurrent-service-syncs=10 \\
--concurrent-serviceaccount-token-syncs=20 \\
--controllers=*,bootstrapsigner,tokencleaner \\
--enable-garbage-collector=true \\
--experimental-cluster-signing-duration=8670h0m0s \\
--feature-gates=RotateKubeletServerCertificate=true,ExpandPersistentVolumes=true \\
--horizontal-pod-autoscaler-sync-period=10s \\
--horizontal-pod-autoscaler-use-rest-clients=true \\
--kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--leader-elect=true \\
--logtostderr=true \\
--node-cidr-mask-size=24 \\
--node-monitor-grace-period=40s \\
--node-monitor-period=5s \\
--pod-eviction-timeout=2m0s \\
--root-ca-file=/etc/kubernetes/pki/kube-ca.pem \\
--service-account-private-key-file=/etc/kubernetes/pki/sa-key.pem \\
--service-cluster-ip-range=$SVC_CLUSTER_CIDR \\
--terminated-pod-gc-threshold=12500 \\
--use-service-account-credentials=true \\
--v=2 \\
"
EOF

kube-scheduler.conf

  • --leader-elect=true集群运行模式,启用选举功能,被选为 leader 的节点负责处理工作,其它节点为阻塞状态
1
2
3
4
5
6
7
8
9
10
cat > kube-scheduler.conf.example <<EOF
KUBE_SCHEDULER_ARGS="\\
--address=0.0.0.0 \\
--algorithm-provider=DefaultProvider \\
--kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--leader-elect=true \\
--logtostderr=true \\
--v=2 \\
"
EOF

systemd服务文件

kube-apiserver.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > kube-apiserver.service <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
After=etcd.service

[Service]
User=kube
EnvironmentFile=-/etc/kubernetes/kube-apiserver.conf
ExecStart=/usr/local/bin/kube-apiserver \$KUBE_APISERVER_ARGS
Restart=on-failure
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

kube-controller-manager.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat > kube-controller-manager.service <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
User=kube
EnvironmentFile=-/etc/kubernetes/kube-controller-manager.conf
ExecStart=/usr/local/bin/kube-controller-manager \$KUBE_CONTROLLER_MANAGER_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

kube-scheduler.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat > kube-scheduler.service <<EOF
[Unit]
Description=Kubernetes Scheduler Plugin
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
User=kube
EnvironmentFile=-/etc/kubernetes/kube-scheduler.conf
ExecStart=/usr/local/bin/kube-scheduler \$KUBE_SCHEDULER_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

分发配置文件到各master节点

  • 根据master节点的信息替换配置文件里面的字段
1
2
3
4
5
6
7
8
9
10
11
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ${MasterArray[$NODE]} ---"
rsync -avpt kube*service $NODE:/usr/lib/systemd/system/
sed -e "s/{PUBLIC_IP}/${MasterArray[$NODE]}/g" kube-apiserver.conf.example > kube-apiserver.conf.${NODE}
rsync -avpt kube-apiserver.conf.${NODE} $NODE:/etc/kubernetes/kube-apiserver.conf
rsync -avpt kube-controller-manager.conf.example $NODE:/etc/kubernetes/kube-controller-manager.conf
rsync -avpt kube-scheduler.conf.example $NODE:/etc/kubernetes/kube-scheduler.conf
rm -rf *conf.${NODE}
ssh $NODE systemctl daemon-reload
ssh $NODE chown -R kube:kube /etc/kubernetes
done

启动kubernetes服务

  • 可以先在k8s-m1上面启动服务,确认正常之后再在其他master节点启动
1
2
3
systemctl enable --now kube-apiserver.service
systemctl enable --now kube-controller-manager.service
systemctl enable --now kube-scheduler.service
1
2
3
4
5
6
7
8
9
10
11
12
13
kubectl --kubeconfig=/etc/kubernetes/kube-admin.kubeconfig get cs
# 输出示例
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}

kubectl --kubeconfig=/etc/kubernetes/kube-admin.kubeconfig get endpoints
# 输出示例
NAME ENDPOINTS AGE
kubernetes 172.16.80.201:6443 27s
1
2
3
4
5
6
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ${MasterArray[$NODE]} ---"
ssh $NODE "systemctl enable --now kube-apiserver"
ssh $NODE "systemctl enable --now kube-controller-manager"
ssh $NODE "systemctl enable --now kube-scheduler"
done
  • 三台master节点的kube-apiserverkube-controller-managerkube-scheduler服务启动成功后可以测试一下
1
2
3
4
kubectl --kubeconfig=/etc/kubernetes/kube-admin.kubeconfig get endpoints
# 输出示例
NAME ENDPOINTS AGE
kubernetes 172.16.80.201:6443,172.16.80.202:6443,172.16.80.203:6443 12m

设置kubectl

  • kubectl命令默认会加载~/.kube/config文件,如果文件不存在则连接http://127.0.0.1:8080,这显然不符合预期,这里使用之前生成的kube-admin.kubeconfig
  • k8s-m1上操作
1
2
3
4
5
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ${MasterArray[$NODE]} ---"
ssh $NODE mkdir -p /root/.kube
rsync -avpt /root/pki/kube-admin.kubeconfig $NODE:/root/.kube/config
done

设置命令补全

  • 设置kubectl 命令自动补全
1
2
3
4
5
6
7
8
9
for NODE in "${!MasterArray[@]}";do
echo "--- $NODE ${MasterArray[$NODE]} ---"
echo "--- kubectl命令自动补全 ---"
ssh $NODE kubectl completion bash >> /etc/bash_completion.d/kubectl
echo "--- kubeadm命令自动补全 ---"
ssh $NODE kubeadm completion bash >> /etc/bash_completion.d/kubeadm
done

source /etc/bash_completion.d/kubectl

设置kubelet的bootstrap启动所需的RBAC

  • 当集群开启了 TLS 认证后,每个节点的 kubelet 组件都要使用由 apiserver 使用的 CA 签发的有效证书才能与
    apiserver 通讯;此时如果节点多起来,为每个节点单独签署证书将是一件非常繁琐的事情;TLS bootstrapping 功能就是让 kubelet 先使用一个预定的低权限用户连接到 apiserver,然后向 apiserver 申请证书,kubelet 的证书由 apiserver 动态签署;

  • 在其中一个master节点上执行就可以,以k8s-m1为例

创建工作目录

1
2
mkdir -p /root/yaml/tls-bootstrap
cd /root/yaml/tls-bootstrap/

kubelet-bootstrap-rbac.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 创建yaml文件
cat > kubelet-bootstrap-rbac.yaml <<EOF
# 给予 kubelet-bootstrap 用户进行 node-bootstrapper 的权限
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kubelet-bootstrap
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:node-bootstrapper
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kubelet-bootstrap
EOF

tls-bootstrap-clusterrole.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
# 创建yaml文件
cat > tls-bootstrap-clusterrole.yaml <<EOF
# A ClusterRole which instructs the CSR approver to approve a node requesting a
# serving cert matching its client cert.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:certificates.k8s.io:certificatesigningrequests:selfnodeserver
rules:
- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests/selfnodeserver"]
verbs: ["create"]
EOF

node-client-auto-approve-csr.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 创建yaml文件
cat > node-client-auto-approve-csr.yaml <<EOF
# 自动批准 system:bootstrappers 组用户 TLS bootstrapping 首次申请证书的 CSR 请求
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: node-client-auto-approve-csr
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:bootstrappers
EOF

node-client-auto-renew-crt.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 创建yaml文件
cat > node-client-auto-renew-crt.yaml <<EOF
# 自动批准 system:nodes 组用户更新 kubelet 自身与 apiserver 通讯证书的 CSR 请求
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: node-client-auto-renew-crt
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:nodes
EOF

node-server-auto-renew-crt.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 创建yaml文件
cat > node-server-auto-renew-crt.yaml <<EOF
# 自动批准 system:nodes 组用户更新 kubelet 10250 api 端口证书的 CSR 请求
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: node-server-auto-renew-crt
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:selfnodeserver
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:nodes
EOF

创建tls-bootstrap-rbac

1
kubectl apply -f .

设置kube-apiserver获取node信息的权限

说明

本文部署的kubelet关闭了匿名访问,因此需要额外为kube-apiserver添加权限用于访问kubelet的信息

若没添加此RBAC,则kubectl在执行logsexec等指令的时候会提示401 Forbidden

1
2
kubectl -n kube-system logs calico-node-pc8lq 
Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=proxy) ( pods/log calico-node-pc8lq)

参考文档:Kublet的认证授权

创建yaml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
cat > /root/yaml/apiserver-to-kubelet-rbac.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-apiserver-to-kubelet
rules:
- apiGroups:
- ""
resources:
- nodes/proxy
- nodes/stats
- nodes/log
- nodes/spec
- nodes/metrics
verbs:
- "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-apiserver
namespace: ""
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-apiserver-to-kubelet
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kube-apiserver
EOF

创建RBAC

1
kubectl apply -f /root/yaml/apiserver-to-kubelet-rbac.yaml

kubernetes worker节点

worker节点说明

  • 安装Docker-ce,配置与master节点一致即可
  • 安装cni-plugins、kubelet、kube-proxy
  • 关闭防火墙和SELINUX
  • kubelet和kube-proxy运行需要root权限
  • 这里是以k8s-m1、k8s-m2、k8s-m3作为Work节点加入集群

kubelet

  • 管理容器生命周期、节点状态监控
  • 目前 kubelet 支持三种数据源来获取节点Pod信息:
    • 本地文件
    • 通过 url 从网络上某个地址来获取信息
    • API Server:从 kubernetes master 节点获取信息
  • 使用kubeconfig与kube-apiserver通信
  • 这里启用TLS-Bootstrap实现kubelet证书动态签署证书,并自动生成kubeconfig

kube-proxy

  • Kube-proxy是实现Service的关键插件,kube-proxy会在每台节点上执行,然后监听API Server的Service与Endpoint资源物件的改变,然后来依据变化调用相应的组件来实现网路的转发
  • kube-proxy可以使用userspace(基本已废弃)、iptables(默认方式)和ipvs来实现数据报文的转发
  • 这里使用的是性能更好、适合大规模使用的ipvs
  • 使用kubeconfig与kube-apiserver通信

切换工作目录

  • k8s-m1上操作
1
cd /root/worker

worker组件配置模板

kubelet.conf

  • --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig指定bootstrap启动时使用的kubeconfig
  • --network-plugin=cni定义网络插件,Pod生命周期使用此网络插件
  • --node-labels=node-role.kubernetes.io/node=''kubelet注册当前Node时设置的Label,以key=value的格式表示,多个labe以逗号分隔
  • --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.1Pod的pause镜像
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat > kubelet.conf <<EOF
KUBELET_ARGS=" \\
--bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \\
--cert-dir=/etc/kubernetes/ssl \\
--config=/etc/kubernetes/kubelet.config.file \\
--cni-conf-dir=/etc/cni/net.d \\
--cni-bin-dir=/opt/cni/bin \\
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\
--logtostderr=true \\
--network-plugin=cni \\
--node-labels=node-role.kubernetes.io/node='' \\
--pod-infra-container-image=gcrxio/pause:3.1 \\
--v=2 \\
"
EOF

kubelet.config.file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
cat > kubelet.config.file <<EOF
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
address: 0.0.0.0
authentication:
# 匿名访问
anonymous:
enabled: false
webhook:
cacheTTL: 2m0s
enabled: true
x509:
# 这里写kubernetes-ca证书的路径
clientCAFile: /etc/kubernetes/pki/kube-ca.pem
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 5m0s
cacheUnauthorizedTTL: 30s
# cgroups的驱动,可选systemd和cgroupfs
cgroupDriver: cgroupfs
cgroupsPerQOS: true
# 指定Pod的DNS服务器IP地址
clusterDNS:
- 10.96.0.10
# 集群的域名
clusterDomain: cluster.local
containerLogMaxFiles: 5
containerLogMaxSize: 10Mi
contentType: application/vnd.kubernetes.protobuf
cpuCFSQuota: true
cpuManagerPolicy: none
cpuManagerReconcilePeriod: 10s
enableControllerAttachDetach: true
enableDebuggingHandlers: true
enforceNodeAllocatable:
- pods
eventBurst: 10
eventRecordQPS: 5
# 达到某些阈值之后,kubelet会驱逐Pod
# A set of eviction thresholds (e.g. memory.available<1Gi) that if met would trigger a pod eviction.
# (default imagefs.available<15%,memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%)
evictionHard:
imagefs.available: 15%
memory.available: 1000Mi
nodefs.available: 10%
nodefs.inodesFree: 10%
evictionPressureTransitionPeriod: 5m0s
# 检测到系统已启用swap分区时kubelet会启动失败
failSwapOn: false
# 定义feature gates
featureGates:
# kubelet 在证书即将到期时会自动发起一个 renew 自己证书的 CSR 请求
# 其实rotate证书已经默认开启,这里显示定义是为了方便查看
RotateKubeletClientCertificate: true
RotateKubeletServerCertificate: true
# 检查kubelet配置文件变更的间隔
fileCheckFrequency: 20s
# 允许endpoint在尝试访问自己的服务时会被负载均衡分发到自身
# 可选值"promiscuous-bridge", "hairpin-veth" and "none"
# 默认值为promiscuous-bridge
hairpinMode: promiscuous-bridge
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 20s
# 这里定义容器镜像触发回收空间的上限值和下限值
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80
imageMinimumGCAge: 2m0s
iptablesDropBit: 15
iptablesMasqueradeBit: 14
kubeAPIBurst: 10
kubeAPIQPS: 5
makeIPTablesUtilChains: true
# kubelet进程最大能打开的文件数量,默认是1000000
maxOpenFiles: 1000000
# 当前节点kubelet所能运行的最大Pod数量
maxPods: 110
# node状态上报间隔
nodeStatusUpdateFrequency: 10s
oomScoreAdj: -999
podPidsLimit: -1
# kubelet服务端口
port: 10250
registryBurst: 10
registryPullQPS: 5
# 指定域名解析文件
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 2m0s
# 拉镜像时,同一时间只拉取一个镜像
# We recommend *not* changing the default value on nodes that run docker daemon with version < 1.9 or an Aufs storage backend. Issue #10959 has more details. (default true)
serializeImagePulls: true
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 4h0m0s
syncFrequency: 1m0s
volumeStatsAggPeriod: 1m0s
EOF

kube-proxy.conf

1
2
3
4
5
6
cat > kube-proxy.conf <<EOF
KUBE_PROXY_ARGS=" \\
--config=/etc/kubernetes/kube-proxy.config.file \\
--v=2 \\
"
EOF

kube-proxy.config.file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
cat > kube-proxy.config.file <<EOF
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig
qps: 5
# 集群中pod的CIDR范围,从这个范围以外发送到服务集群IP的流量将被伪装,从POD发送到外部LoadBalanceIP的流量将被定向到各自的集群IP
clusterCIDR: "10.244.0.0/16"
configSyncPeriod: 15m0s
conntrack:
max: null
# 每个核心最大能跟踪的NAT连接数,默认32768
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
# SNAT所有通过服务集群ip发送的通信
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
# ipvs调度类型,默认是rr
scheduler: "rr"
syncPeriod: 30s
metricsBindAddress: 127.0.0.1:10249
mode: "ipvs"
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
EOF

systemd服务文件

kubelet.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > kubelet.service <<EOF
[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=-/etc/kubernetes/kubelet.conf
ExecStart=/usr/local/bin/kubelet \$KUBELET_ARGS
Restart=on-failure
KillMode=process
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

kube-proxy.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat > kube-proxy.service <<EOF
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
EnvironmentFile=-/etc/kubernetes/kube-proxy.conf
# 这里启动时使用ipvsadm将TCP的keepalive时间设置,默认是900
ExecStartPre=/usr/sbin/ipvsadm --set 900 120 300
ExecStart=/usr/local/bin/kube-proxy \$KUBE_PROXY_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

分发证书和kubeconfig文件

  • k8s-m1上操作
  • 在worker节点建立对应的目录
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for NODE in "${!WorkerArray[@]}";do
echo "--- $NODE ---"
echo "--- 创建目录 ---"
ssh $NODE mkdir -p /opt/cni/bin \
/etc/cni/net.d \
/etc/kubernetes/pki \
/etc/kubernetes/manifests \
/var/lib/kubelet
rsync -avpt /root/pki/kube-proxy.kubeconfig \
/root/pki/bootstrap.kubeconfig \
$NODE:/etc/kubernetes/
rsync -avpt /root/pki/kube-ca.pem \
/root/pki/front-proxy-ca.pem \
$NODE:/etc/kubernetes/pki/
done

分发二进制文件

  • k8s-m1上操作
1
2
3
4
5
6
7
8
9
for NODE in "${!WorkerArray[@]}";do
echo "--- $NODE ---"
echo "--- 分发kubernetes二进制文件 ---"
rsync -avpt /root/software/kubernetes/server/bin/kubelet \
/root/software/kubernetes/server/bin/kube-proxy \
$NODE:/usr/local/bin/
echo "--- 分发CNI-Plugins ---"
rsync -avpt /root/software/cni-plugins/* $NODE:/opt/cni/bin/
done

分发配置文件和服务文件

1
2
3
4
5
6
for NODE in "${!WorkerArray[@]}";do
echo "--- $NODE ---"
rsync -avpt kubelet.conf kubelet.config.file kube-proxy.conf kube-proxy.config.file $NODE:/etc/kubernetes/
rsync -avpt kubelet.service kube-proxy.service $NODE:/usr/lib/systemd/system/
ssh $NODE systemctl daemon-reload
done

启动服务

1
2
3
4
for NODE in "${!WorkerArray[@]}";do
echo "--- $NODE ---"
ssh $NODE systemctl enable --now docker.service kubelet.service kube-proxy.service
done

获取节点信息

  • 此时由于未按照网络插件,所以节点状态为NotReady
1
2
3
4
5
6
kubectl get node -o wide
# 示例输出
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-m1 NotReady node 12s v1.11.5 172.16.80.201 <none> CentOS Linux 7 (Core) 3.10.0-957.1.3.el7.x86_64 docker://18.3.1
k8s-m2 NotReady node 12s v1.11.5 172.16.80.202 <none> CentOS Linux 7 (Core) 3.10.0-957.1.3.el7.x86_64 docker://18.3.1
k8s-m3 NotReady node 12s v1.11.5 172.16.80.203 <none> CentOS Linux 7 (Core) 3.10.0-957.1.3.el7.x86_64 docker://18.3.1

kubernetes Core Addons

网络组件部署(二选其一)

  • 只要符合CNI规范的网络组件都可以给kubernetes使用
  • 网络组件清单可以在这里看到Network Plugins
  • 这里只列举kube-flannelcalico,flannel和calico的区别可以自己去找资料
  • 网络组件只能选一个来部署
  • 本文使用kube-flannel部署网络组件,calico已测试可用
  • k8s-m1上操作

创建工作目录

1
mkdir -p /root/yaml/network-plugin/{kube-flannel,calico}

kube-flannel

说明

  • kube-flannel基于VXLAN的方式创建容器二层网络,使用端口8472/UDP通信
  • flannel 第一次启动时,从 etcd 获取 Pod 网段信息,为本节点分配一个未使用的 /24 段地址,然后创建 flannel.1(也可能是其它名称,如 flannel1 等) 接口。
  • 官方提供yaml文件部署为DeamonSet
  • 若需要使用NetworkPolicy功能,可以关注这个项目canal

架构图

切换工作目录

1
cd /root/yaml/network-plugin/kube-flannel

下载yaml文件

1
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
  • 官方yaml文件包含多个平台的daemonset,包括amd64、arm64、arm、ppc64le、s390x

  • 这里以amd64作为例子,其他的可以自行根据需要修改或者直接删除不需要的daemonset

  • 官方yaml文件已经配置好容器网络为10.244.0.0/16,这里需要跟kube-controller-manager.conf里面的--cluster-cidr匹配

  • 如果在kube-controller-manager.conf里面把--cluster-cidr改成了其他地址段,例如192.168.0.0/16,用以下命令替换kube-flannel.yaml相应的字段

1
sed -e 's,"Network": "10.244.0.0/16","Network": "192.168.0.0/16," -i kube-flannel.yml
  • 如果服务器有多个网卡,需要指定网卡用于flannel通信,以网卡ens33为例

    • args下面添加一行- --iface=ens33
1
2
3
4
5
6
7
8
9
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.10.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=ens33

修改backend

  • flannel支持多种后端实现,可选值为VXLANhost-gwUDP
  • 从性能上,host-gw是最好的,VXLANUDP次之
  • 默认值是VXLAN,这里以修改为host-gw为例,位置大概在75行左右
1
2
3
4
5
6
7
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "host-gw"
}
}

部署kube-flannel

1
kubectl apply -f kube-flannel.yml

检查部署情况

1
2
3
4
5
kubectl -n kube-system get pod -l k8s-app=flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-27jwl 2/2 Running 0 59s
kube-flannel-ds-4fgv6 2/2 Running 0 59s
kube-flannel-ds-mvrt7 2/2 Running 0 59s
  • 如果等很久都没Running,可能是quay.io对你来说太慢了
  • 可以替换一下镜像,重新apply
1
2
sed -e 's,quay.io/coreos/,zhangguanzhang/quay.io.coreos.,g' -i kube-flannel.yml
kubectl apply -f kube-flannel.yaml

Calico

说明

  • Calico 是一款纯 Layer 3 的网络,节点之间基于BGP协议来通信。
  • 这里以calico-v3.4.0来作为示例
  • 部署文档

架构图

切换工作目录

1
cd /root/yaml/network-plugin/calico

下载yaml文件

  • 这里使用kubernetes API来保存网络信息
1
2
wget https://docs.projectcalico.org/v3.4/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
wget https://docs.projectcalico.org/v3.4/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calicoctl.yaml
  • 官方yaml文件默认配置容器网络为192.168.0.0/16,这里需要跟kube-controller-manager.conf里面的--cluster-cidr匹配,需要替换相应字段
1
sed -e "s,192.168.0.0/16,${POD_NET_CIDR},g" -i calico.yaml
  • 官方yaml文件定义calicoctl为Pod,而不是deployment,所以需要调整一下
  • 修改kind: Podkind: Deployment并补充其他字段
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: calicoctl
namespace: kube-system
labels:
k8s-app: calicoctl
spec:
replicas: 1
selector:
matchLabels:
k8s-app: calicoctl
template:
metadata:
name: calicoctl
namespace: kube-system
labels:
k8s-app: calicoctl
spec:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
hostNetwork: true
serviceAccountName: calicoctl
containers:
- name: calicoctl
image: quay.io/calico/ctl:v3.4.0
command: ["/bin/sh", "-c", "while true; do sleep 3600; done"]
tty: true
env:
- name: DATASTORE_TYPE
value: kubernetes

部署Calico

1
kubectl apply -f /root/yaml/network-plugin/calico/

检查部署情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
kubectl -n kube-system get pod -l k8s-app=calico-node
NAME READY STATUS RESTARTS AGE
calico-node-fjcj4 2/2 Running 0 6m
calico-node-tzppt 2/2 Running 0 6m
calico-node-zdq64 2/2 Running 0 6m

kubectl get pod -n kube-system -l k8s-app=calicoctl
NAME READY STATUS RESTARTS AGE
calicoctl-58df8955f6-sp8q9 0/1 Running 0 38s


kubectl -n kube-system exec -it calicoctl-58df8955f6-sp8q9 -- /calicoctl get node -o wide
NAME ASN IPV4 IPV6
k8s-m1 (unknown) 172.16.80.201/24
k8s-m2 (unknown) 172.16.80.202/24
k8s-m3 (unknown) 172.16.80.203/24

kubectl -n kube-system exec -it calicoctl-58df8955f6-sp8q9 -- /calicoctl get profiles -o wide
NAME LABELS
kns.default map[]
kns.kube-public map[]
kns.kube-system map[]
  • 如果镜像pull不下来,可以替换一下
  • 替换完重新apply
1
2
sed -e 's,quay.io/calico/,zhangguanzhang/quay.io.calico.,g' -i *yaml
kubectl apply -f .

检查节点状态

  • 网络组件部署完成之后,可以看到node状态已经为Ready
1
2
3
4
5
kubectl get node 
NAME STATUS ROLES AGE VERSION
k8s-m1 Ready node 1d v1.11.5
k8s-m2 Ready node 1d v1.11.5
k8s-m3 Ready node 1d v1.11.5

服务发现组件部署

  • kubernetes从v1.11之后,已经使用CoreDNS取代原来的KUBE DNS作为服务发现的组件
  • CoreDNS 是由 CNCF 维护的开源 DNS 方案,前身是 SkyDNS
  • k8s-m1上操作

创建工作目录

1
mkdir -p /root/yaml/coredns
  • 切换工作目录
1
cd /root/yaml/coredns

CoreDNS

创建yaml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
cat > coredns.yaml <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: coredns
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
log
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . /etc/resolv.conf
cache 30
reload
loadbalance
}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: "CoreDNS"
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
k8s-app: kube-dns
spec:
serviceAccountName: coredns
priorityClassName: system-cluster-critical
# 使用podAntiAffinity
# CoreDNS的Pod不会被调度到同一台宿主机
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
containers:
- name: coredns
image: gcrxio/coredns:1.2.6
imagePullPolicy: IfNotPresent
args: [ "-conf", "/etc/coredns/Corefile" ]
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 5
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 70Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
- name: host-time
mountPath: /etc/localtime
dnsPolicy: Default
volumes:
- name: host-time
hostPath:
path: /etc/localtime
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: ${POD_DNS_SERVER_IP}
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP
EOF

修改yaml文件

  • yaml文件里面定义了clusterIP这里需要与kubelet.config.file里面定义的cluster-dns一致
  • 如果kubelet.conf里面的--cluster-dns改成别的,例如x.x.x.x,这里也要做相应变动,不然Pod找不到DNS,无法正常工作
  • 这里定义静态的hosts解析,这样Pod可以通过hostname来访问到各节点主机
  • 用下面的命令根据HostArray的信息生成静态的hosts解析
1
2
3
4
5
6
7
8
sed -e '57r '<(\
echo ' hosts {'; \
for NODE in "${!HostArray[@]}";do \
echo " ${HostArray[$NODE]} $NODE"; \
done;\
echo ' fallthrough'; \
echo ' }';) \
-i coredns.yaml
  • 上面的命令的作用是,通过HostArray的信息生成hosts解析配置,顺序是打乱的,可以手工调整顺序
  • 也可以手动修改coredns.yaml文件来添加对应字段
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
log
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
hosts {
172.16.80.202 k8s-m2
172.16.80.203 k8s-m3
172.16.80.201 k8s-m1
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . /etc/resolv.conf
cache 30
reload
loadbalance
}

部署CoreDNS

1
kubectl apply -f coredns.yaml

检查部署状态

1
2
3
4
kubectl -n kube-system get pod -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-5566c96697-6gzzc 1/1 Running 0 45s
coredns-5566c96697-q5slk 1/1 Running 0 45s

验证集群DNS服务

  • 创建一个deployment测试DNS解析
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# 创建一个基于busybox的deployment
cat > /root/yaml/busybox-deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: busybox
name: busybox
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
imagePullPolicy: IfNotPresent
image: busybox:1.26
command:
- sleep
- "3600"
EOF

# 基于文件创建deployment
kubectl apply -f /root/yaml/busybox-deployment.yaml
  • 检查deployment部署情况
1
2
3
kubectl get pod
NAME READY STATUS RESTARTS AGE
busybox-7b9bfb5658-872gj 1/1 Running 0 6s
  • 验证集群DNS解析
  • 上一个命令获取到pod名字为busybox-7b9bfb5658-872gj
  • 通过kubectl命令连接到Pod运行nslookup命令测试使用域名来访问kube-apiserver和各节点主机
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
echo "--- 通过CoreDNS访问kubernetes ---"
kubectl exec -it busybox-7b9bfb5658-4cz94 -- nslookup kubernetes
# 示例输出
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local


echo "--- 通过CoreDNS访问k8s-m1 ---"
# 示例输出
kubectl exec -it busybox-7b9bfb5658-4cz94 -- nslookup k8s-m1
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: k8s-m1
Address 1: 172.16.80.201 k8s-m1


echo "--- 通过CoreDNS访问k8s-m2 ---"
kubectl exec -it busybox-7b9bfb5658-4cz94 -- nslookup k8s-m2
# 示例输出
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: k8s-n2
Address 1: 172.16.80.202 k8s-m2


echo "--- 通过CoreDNS访问并不存在的k8s-n3 ---"
kubectl exec -it busybox-7b9bfb5658-4cz94 -- nslookup k8s-n3
# 示例输出
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'k8s-n3'

Metrics Server

  • Metrics Server
    是实现了 Metrics API 的元件,其目标是取代 Heapster 作位 Pod 与 Node 提供资源的 Usage
    metrics,该元件会从每个 Kubernetes 节点上的 Kubelet 所公开的 Summary API 中收集 Metrics
  • Horizontal Pod Autoscaler(HPA)控制器用于实现基于CPU使用率进行自动Pod伸缩的功能。
  • HPA控制器基于Master的kube-controller-manager服务启动参数–horizontal-pod-autoscaler-sync-period定义是时长(默认30秒),周期性监控目标Pod的CPU使用率,并在满足条件时对ReplicationController或Deployment中的Pod副本数进行调整,以符合用户定义的平均Pod
    CPU使用率。
  • 在新版本的kubernetes中 Pod CPU使用率不在来源于heapster,而是来自于metrics-server
  • 官网原话是 The –horizontal-pod-autoscaler-use-rest-clients is true or unset. Setting this to false switches to Heapster-based autoscaling, which is deprecated.
  • k8s-m1上操作

额外参数

  • 设置kube-apiserver参数,这里在配置kube-apiserver阶段已经加进去了
  • front-proxy证书,在证书生成阶段已经完成且已分发
1
2
3
4
5
6
7
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem
--requestheader-allowed-names=aggregator
--requestheader-group-headers=X-Remote-Group
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-username-headers=X-Remote-User

创建工作目录

1
mkdir -p /root/yaml/metrics-server

切换工作目录

1
cd /root/yaml/metrics-server

下载yaml文件

1
2
3
4
5
6
wget https://raw.githubusercontent.com/kubernetes-incubator/metrics-server/master/deploy/1.8%2B/aggregated-metrics-reader.yaml
wget https://raw.githubusercontent.com/kubernetes-incubator/metrics-server/master/deploy/1.8%2B/auth-delegator.yaml
wget https://raw.githubusercontent.com/kubernetes-incubator/metrics-server/master/deploy/1.8%2B/auth-reader.yaml
wget https://raw.githubusercontent.com/kubernetes-incubator/metrics-server/master/deploy/1.8%2B/metrics-apiservice.yaml
wget https://raw.githubusercontent.com/kubernetes-incubator/metrics-server/master/deploy/1.8%2B/metrics-server-service.yaml
wget https://raw.githubusercontent.com/kubernetes-incubator/metrics-server/master/deploy/1.8%2B/resource-reader.yaml

创建metrics-server-deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
cat > metrics-server-deployment.yaml <<EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: ca-ssl
hostPath:
path: /etc/kubernetes/pki
containers:
- name: metrics-server
image: gcrxio/metrics-server-amd64:v0.3.1
imagePullPolicy: IfNotPresent
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-port=10250
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem
- --requestheader-username-headers=X-Remote-User
- --requestheader-group-headers=X-Remote-Group
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --kubelet-insecure-tls
- -v=2
volumeMounts:
- name: ca-ssl
mountPath: /etc/kubernetes/pki
EOF

部署metrics-server

1
kubectl apply -f .

查看pod状态

1
2
3
kubectl -n kube-system get pod -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
pod/metrics-server-86bd9d7667-5hbn6 1/1 Running 0 1m

验证metrics

  • 完成后,等待一段时间(约 30s - 1m)收集 Metrics
1
2
3
4
5
6
7
8
9
10
11
12
13
# 请求metrics api的结果
kubectl get --raw /apis/metrics.k8s.io/v1beta1
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"metrics.k8s.io/v1beta1","resources":[{"name":"nodes","singularName":"","namespaced":false,"kind":"NodeMetrics","verbs":["get","list"]},{"name":"pods","singularName":"","namespaced":true,"kind":"PodMetrics","verbs":["get","list"]}]}

kubectl get apiservice|grep metrics
v1beta1.metrics.k8s.io 2018-12-09T08:17:26Z

# 获取节点性能信息
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-m1 113m 2% 1080Mi 14%
k8s-m2 133m 3% 1086Mi 14%
k8s-m3 100m 2% 1029Mi 13%

至此集群已具备基本功能

下面的Extra Addons就是一些额外的功能

kubernetes Extra Addons

Dashboard

  • Dashboard 是kubernetes社区提供的GUI界面,用于图形化管理kubernetes集群,同时可以看到资源报表。
  • 官方提供yaml文件直接部署,但是需要更改image以便国内部署
  • k8s-m1上操作

创建工作目录

1
mkdir -p /root/yaml/kubernetes-dashboard

切换工作目录

1
cd /root/yaml/kubernetes-dashboard

获取yaml文件

1
wget https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml

修改镜像地址

1
sed -e 's,k8s.gcr.io/kubernetes-dashboard-amd64,gcrxio/kubernetes-dashboard-amd64,g' -i kubernetes-dashboard.yaml

创建kubernetes-Dashboard

1
kubectl apply -f kubernetes-dashboard.yaml

创建ServiceAccount RBAC

  • 官方的yaml文件,ServiceAccount绑定的RBAC权限很低,很多资源无法查看
  • 需要创建一个用于管理全局的ServiceAccount
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat > cluster-admin.yaml <<EOF
---
# 在kube-system中创建名为admin-user的ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kube-system
---
# 将admin-user和cluster-admin绑定在一起
# cluster-admin是kubernetes内置的clusterrole,具有集群管理员权限
# 其他内置的clusterrole可以通过kubectl get clusterrole查看
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kube-system
EOF

kubectl apply -f cluster-admin.yaml

获取ServiceAccount的Token

1
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')

查看部署情况

1
kubectl get all -n kube-system --selector k8s-app=kubernetes-dashboard

访问Dashboard

  • kubernetes-dashborad的svc默认是clusterIP,需要修改为nodePort才能被外部访问
  • 随机分配NodePort,分配范围由kube-apiserver--service-node-port-range参数指定
1
kubectl patch -n kube-system svc kubernetes-dashboard -p '{"spec":{"type":"NodePort"}}'
  • 修改完之后,通过以下命令获取访问kubernetes-Dashboard的端口
1
2
3
kubectl -n kube-system get svc --selector k8s-app=kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.106.183.192 <none> 443:30216/TCP 12s
  • 可以看到已经将节点的30216端口暴露出来

Dashboard UI预览图

Ingress Controller

  • Ingress 是 Kubernetes 中的一个抽象资源,其功能是通过 Web Server 的 Virtual Host
    概念以域名(Domain Name)方式转发到內部 Service,这避免了使用 Service 中的 NodePort 与
    LoadBalancer 类型所带來的限制(如 Port 数量上限),而实现 Ingress 功能则是通过 Ingress Controller
    来达成,它会负责监听 Kubernetes API 中的 Ingress 与 Service 资源,并在发生资源变化时,根据资源预期的结果来设置 Web Server。
  • Ingress Controller 有许多实现可以选择,这里只是列举一小部分
    • Ingress NGINX:Kubernetes 官方维护的方案,本次安装使用此方案
    • kubernetes-ingress:由nginx社区维护的方案,使用社区版nginx和nginx-plus
    • treafik:一款开源的反向代理与负载均衡工具。它最大的优点是能够与常见的微服务系统直接整合,可以实现自动化动态配置
  • k8s-m1上操作

创建工作目录

1
mkdir -p /root/yaml/ingress/ingress-nginx

切换工作目录

1
cd /root/yaml/ingress/ingress-nginx

下载yaml文件

1
2
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.20.0/deploy/mandatory.yaml
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.20.0/deploy/provider/baremetal/service-nodeport.yaml

修改镜像地址

1
2
3
sed -e 's,k8s.gcr.io/,zhangguanzhang/gcr.io.google_containers.,g' \
-e 's,quay.io/kubernetes-ingress-controller/,zhangguanzhang/quay.io.kubernetes-ingress-controller.,g' \
-i mandatory.yaml

创建ingress-nginx

1
kubectl apply -f .

检查部署情况

1
kubectl -n ingress-nginx get pod

访问ingress

  • 默认的backend会返回404
1
2
3
4
5
6
7
8
9
kubectl -n ingress-nginx get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx NodePort 10.96.250.140 <none> 80:32603/TCP,443:30083/TCP 1m

curl http://172.16.80.200:32603
default backend - 404

curl -k https://172.16.80.200:30083
default backend - 404
注意
  • 这里部署之后,是deployment,且通过nodePort暴露服务

  • 也可以修改yaml文件,将Ingress-nginx部署为DaemonSet

    • 使用labelsnodeSelector来指定运行ingress-nginx的节点
    • 使用hostNetwork=true来共享主机网络命名空间,或者使用hostPort指定主机端口映射
    • 如果使用hostNetwork共享宿主机网络栈或者hostPort映射宿主机端口,记得要看看有没有端口冲突,否则无法启动
    • 修改监听端口可以在ingress-nginx启动命令中添加--http-port=8180--https-port=8543,还有下面的端口定义也相应变更即可

创建kubernetes-Dashboard的Ingress

  • kubernetes-Dashboard默认是开启了HTTPS访问的
  • ingress-nginx需要以HTTPS的方式反向代理kubernetes-Dashboard
  • 以HTTP方式访问kubernetes-Dashboard的时候会被重定向到HTTPS
  • 需要创建HTTPS证书,用于访问ingress-nginx的HTTPS端口

创建HTTPS证书

  • 这里的CN=域名/O=域名需要跟后面的ingress主机名匹配
1
2
3
4
5
6
7
openssl req -x509 \
-nodes \
-days 3650 \
-newkey rsa:2048 \
-keyout tls.key \
-out tls.crt \
-subj "/CN=dashboard.k8s.local/O=dashboard.k8s.local"

创建secret对象

  • 这里将HTTPS证书创建为kubernetes的secret对象dashboard-tls
  • ingress创建的时候需要加载这个作为HTTPS证书
1
kubectl -n kube-system create secret tls dashboard-tls --key ./tls.key --cert ./tls.crt

创建dashboard-ingress.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: dashboard-ingress
namespace: kube-system
annotations:
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/secure-backends: "true"
spec:
tls:
- hosts:
- dashboard.k8s.local
secretName: dashboard-tls
rules:
- host: dashboard.k8s.local
http:
paths:
- path: /
backend:
serviceName: kubernetes-dashboard
servicePort: 443

创建ingress

1
kubectl apply -f dashboard-ingress.yaml

检查ingress

1
2
3
kubectl -n kube-system get ingress
NAME HOSTS ADDRESS PORTS AGE
dashboard-ingress dashboard.k8s.local 80, 443 16m

访问kubernetes-Dashboard

  • 修改主机hosts静态域名解析,以本文为例在hosts文件里添加172.16.80.200 dashboard.k8s.local
  • 使用https://dashboard.k8s.local:30083访问kubernetesDashboard了
  • 添加了TLS之后,访问HTTP会被跳转到HTTPS端口,这里比较坑爹,没法自定义跳转HTTPS的端口
  • 此处使用的是自签名证书,浏览器会提示不安全,请忽略
  • 建议搭配external-DNSLoadBalancer一起食用,效果更佳

Helm

  • Helm是一个kubernetes应用的包管理工具,用来管理charts——预先配置好的安装包资源,有点类似于Ubuntu的APT和CentOS中的yum。
  • Helm chart是用来封装kubernetes原生应用程序的yaml文件,可以在你部署应用的时候自定义应用程序的一些metadata,便与应用程序的分发。
  • Helm和charts的主要作用:
    • 应用程序封装
    • 版本管理
    • 依赖检查
    • 便于应用程序分发

环境要求

  • kubernetes v1.6及以上的版本,启用RBAC
  • 集群可以访问到chart仓库
  • helm客户端主机能访问kubernetes集群

安装客户端

安装方式二选一,需要科学上网

直接脚本安装

1
2
echo '--- 使用脚本安装,默认是最新版 ---'
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get | bash

下载二进制文件安装

1
2
3
4
5
echo '--- 下载二进制文件安装 ---'
wget https://storage.googleapis.com/kubernetes-helm/helm-v2.12.0-linux-amd64.tar.gz
tar xzf helm-v2.12.0-linux-amd64.tar.gz linux-amd64/helm
mv linux-amd64/helm /usr/local/bin/
rm -rf linux-amd64

创建工作目录

1
mkdir /root/yaml/helm/

切换工作目录

1
cd /root/yaml/helm

创建RBAC规则

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat > /root/yaml/helm/helm-rbac.yaml <<EOF
# 创建名为tiller的ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
# 给tiller绑定cluster-admin权限
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: tiller-cluster-rule
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
EOF

kubectl apply -f /root/yaml/helm/helm-rbac.yaml

安装服务端

  • 这里指定了helm的stable repo国内镜像地址
  • 具体说明请看这里
1
2
3
helm init --tiller-image gcrxio/tiller:v2.12.0 \
--service-account tiller \
--stable-repo-url http://mirror.azure.cn/kubernetes/charts/

检查安装情况

1
2
3
4
5
6
7
8
9
kubectl -n kube-system get pod -l app=helm,name=tiller
# 输出示例
NAME READY STATUS RESTARTS AGE
tiller-deploy-84fc6cd5f9-nz4m7 1/1 Running 0 1m

helm version
# 输出示例
Client: &version.Version{SemVer:"v2.12.0", GitCommit:"d325d2a9c179b33af1a024cdb5a4472b6288016a", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.0", GitCommit:"d325d2a9c179b33af1a024cdb5a4472b6288016a", GitTreeState:"clean"}

添加命令行补全

1
2
helm completion bash  > /etc/bash_completion.d/helm
source /etc/bash_completion.d/helm

Rook(测试用途

说明

  • Rook是一款云原生环境下的开源分布式存储编排系统,目前已进入CNCF孵化。Rook的官方网站是https://rook.io
  • Rook将分布式存储软件转变为自我管理,自我缩放和自我修复的存储服务。它通过自动化部署,引导、配置、供应、扩展、升级、迁移、灾难恢复、监控和资源管理来实现。 Rook使用基础的云原生容器管理、调度和编排平台提供的功能来履行其职责。
  • Rook利用扩展点深入融入云原生环境,为调度、生命周期管理、资源管理、安全性、监控和用户体验提供无缝体验。
  • Ceph Custom Resource Definition(CRD)已经在Rook v0.8版本升级到Beta
  • 其他特性请查看项目文档
  • 这里只用作测试环境中提供StorageClass和持久化存储
  • 请慎重考虑是否部署在生产环境中

Rook与kubernetes的集成

Rook架构图

安装

  • 这里以Rook v0.8.3作为示例
  • 这里默认使用/var/lib/rook/osd*目录来运行OSD
  • 需要最少3个节点,否则无足够的节点启动集群
  • 可以使用yaml文件部署和使用helm chart部署,这里使用yaml文件部署

创建工作目录

1
mkdir -p /root/yaml/rook/

进入工作目录

1
cd /root/yaml/rook/

下载yaml文件

1
2
3
4
# operator实现自定义API用于管理rook-ceph
wget https://raw.githubusercontent.com/rook/rook/v0.8.3/cluster/examples/kubernetes/ceph/operator.yaml
# cluster用于部署rook-ceph集群
wget https://raw.githubusercontent.com/rook/rook/v0.8.3/cluster/examples/kubernetes/ceph/cluster.yaml

部署operator

1
kubectl apply -f operator.yaml

检查operator安装情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
kubectl -n rook-ceph-system get all
# 输出示例
NAME READY STATUS RESTARTS AGE
pod/rook-ceph-agent-4qwvd 1/1 Running 0 11m
pod/rook-ceph-agent-v5ghj 1/1 Running 0 11m
pod/rook-ceph-agent-zv8s6 1/1 Running 0 11m
pod/rook-ceph-operator-745f756bd8-9gdpk 1/1 Running 0 12m
pod/rook-discover-44lx5 1/1 Running 0 11m
pod/rook-discover-4d6mn 1/1 Running 0 11m
pod/rook-discover-mvqfv 1/1 Running 0 11m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/rook-ceph-agent 3 3 3 3 3 <none> 11m
daemonset.apps/rook-discover 3 3 3 3 3 <none> 11m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/rook-ceph-operator 1 1 1 1 12m

NAME DESIRED CURRENT READY AGE
replicaset.apps/rook-ceph-operator-745f756bd8 1 1 1 12m

部署cluster

1
kubectl apply -f cluster.yaml

检查cluster部署情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
kubectl -n rook-ceph get all
# 输出示例
NAME READY STATUS RESTARTS AGE
pod/rook-ceph-mgr-a-7944d8d79b-pvrsf 1/1 Running 0 10m
pod/rook-ceph-mon0-ll7fc 1/1 Running 0 11m
pod/rook-ceph-mon1-cd2gb 1/1 Running 0 11m
pod/rook-ceph-mon2-vlmfc 1/1 Running 0 10m
pod/rook-ceph-osd-id-0-745486df7b-4dxdc 1/1 Running 0 10m
pod/rook-ceph-osd-id-1-85fdf4cd64-ftmc4 1/1 Running 0 10m
pod/rook-ceph-osd-id-2-6bc4fbb457-295pn 1/1 Running 0 10m
pod/rook-ceph-osd-prepare-k8s-m1-klv5j 0/1 Completed 0 10m
pod/rook-ceph-osd-prepare-k8s-m2-dt2pl 0/1 Completed 0 10m
pod/rook-ceph-osd-prepare-k8s-m3-ndqpl 0/1 Completed 0 10m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/rook-ceph-mgr ClusterIP 10.100.158.219 <none> 9283/TCP 10m
service/rook-ceph-mgr-dashboard ClusterIP 10.107.141.138 <none> 7000/TCP 10m
service/rook-ceph-mgr-dashboard-external NodePort 10.99.89.12 <none> 7000:30660/TCP 10m
service/rook-ceph-mon0 ClusterIP 10.100.50.229 <none> 6790/TCP 11m
service/rook-ceph-mon1 ClusterIP 10.110.105.207 <none> 6790/TCP 11m
service/rook-ceph-mon2 ClusterIP 10.103.223.166 <none> 6790/TCP 10m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/rook-ceph-mgr-a 1 1 1 1 10m
deployment.apps/rook-ceph-osd-id-0 1 1 1 1 10m
deployment.apps/rook-ceph-osd-id-1 1 1 1 1 10m
deployment.apps/rook-ceph-osd-id-2 1 1 1 1 10m

NAME DESIRED CURRENT READY AGE
replicaset.apps/rook-ceph-mgr-a-7944d8d79b 1 1 1 10m
replicaset.apps/rook-ceph-mon0 1 1 1 11m
replicaset.apps/rook-ceph-mon1 1 1 1 11m
replicaset.apps/rook-ceph-mon2 1 1 1 10m
replicaset.apps/rook-ceph-osd-id-0-745486df7b 1 1 1 10m
replicaset.apps/rook-ceph-osd-id-1-85fdf4cd64 1 1 1 10m
replicaset.apps/rook-ceph-osd-id-2-6bc4fbb457 1 1 1 10m

NAME DESIRED SUCCESSFUL AGE
job.batch/rook-ceph-osd-prepare-k8s-m1 1 1 10m
job.batch/rook-ceph-osd-prepare-k8s-m2 1 1 10m
job.batch/rook-ceph-osd-prepare-k8s-m3 1 1 10m

检查ceph集群状态

  • 上面命令已经获取ceph-mon0节点的pod名rook-ceph-mon0-ll7fc,以此pod为例运行以下命令
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
kubectl -n rook-ceph exec -it rook-ceph-mon0-ll7fc -- ceph -s
# 输出示例
cluster:
id: 1fcee02c-fd98-4b13-bfed-de7b6605a237
health: HEALTH_OK

services:
mon: 3 daemons, quorum rook-ceph-mon0,rook-ceph-mon2,rook-ceph-mon1
mgr: a(active)
osd: 3 osds: 3 up, 3 in

data:
pools: 1 pools, 100 pgs
objects: 0 objects, 0 bytes
usage: 22767 MB used, 96979 MB / 116 GB avail
pgs: 100 active+clean

暴露ceph-mgr的dashboard

1
2
wget https://raw.githubusercontent.com/rook/rook/v0.8.3/cluster/examples/kubernetes/ceph/dashboard-external.yaml
kubectl apply -f dashboard-external.yaml

访问已暴露的dashboard

1
2
3
4
5
6
7
8
9
kubectl -n rook-ceph get svc
# 输出示例
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr ClusterIP 10.100.158.219 <none> 9283/TCP 12m
rook-ceph-mgr-dashboard ClusterIP 10.107.141.138 <none> 7000/TCP 12m
rook-ceph-mgr-dashboard-external NodePort 10.99.89.12 <none> 7000:30660/TCP 11m
rook-ceph-mon0 ClusterIP 10.100.50.229 <none> 6790/TCP 13m
rook-ceph-mon1 ClusterIP 10.110.105.207 <none> 6790/TCP 13m
rook-ceph-mon2 ClusterIP 10.103.223.166 <none> 6790/TCP 12m
  • 可以见到这里暴露30660端口,通过此端口可以访问Dashboard

添加StorageClass

  • 添加多副本存储池
  • 注释部分是创建纠删码存储池
  • 添加StorageClass指定使用多副本存储池,格式化为xfs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat > rbd-storageclass.yaml <<EOF
apiVersion: ceph.rook.io/v1beta1
kind: Pool
metadata:
name: replicapool
namespace: rook-ceph
spec:
replicated:
size: 3
# For an erasure-coded pool, comment out the replication size above and uncomment the following settings.
# Make sure you have enough OSDs to support the replica size or erasure code chunks.
#erasureCoded:
# dataChunks: 2
# codingChunks: 1
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: ceph.rook.io/block
parameters:
pool: replicapool
# Specify the namespace of the rook cluster from which to create volumes.
# If not specified, it will use `rook` as the default namespace of the cluster.
# This is also the namespace where the cluster will be
clusterNamespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, it will use `ext4`.
fstype: xfs
EOF

kubectl apply -f rbd-storageclass.yaml
  • 还可以添加cephFSobject类型的存储池,然后创建对应的StorageClass

具体可以看filesystem.yamlobject.yaml

检查StorageClass

  • 创建sc时,会在rook-ceph上创建对应的Pool
  • 这里以rbd-storageclass.yaml为例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
kubectl get sc
# 输出示例
NAME PROVISIONER AGE
rook-ceph-block ceph.rook.io/block 15m

kubectl describe sc rook-ceph-block
# 输出示例
Name: rook-ceph-block
IsDefaultClass: No
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"rook-ceph-block","namespace":""},"parameters":{"clusterNamespace":"rook-ceph","fstype":"xfs","pool":"replicapool"},"provisioner":"ceph.rook.io/block"}

Provisioner: ceph.rook.io/block
Parameters: clusterNamespace=rook-ceph,fstype=xfs,pool=replicapool
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>

kubectl -n rook-ceph exec -it rook-ceph-mon0-ll7fc -- ceph df
# 输出示例
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
116G 96979M 22767M 19.01
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
replicapool 1 0 0 29245M 0

卸载Rook-ceph

  • 这里提供卸载的操作步骤,请按需操作!

删除StorageClass

1
kubectl delete -f rbd-storageclass.yaml

删除Rook-Ceph-Cluster

1
kubectl delete -f cluster.yaml

删除Rook-Operator

1
kubectl delete -f operator.yaml

清理目录

  • 注意!这里是所有运行rook-ceph集群的节点都需要做清理
1
rm -rf /var/lib/rook

Prometheus Operator

说明

  • Prometheus Operator 是 CoreOS 开发的基于 Prometheus 的 Kubernetes 监控方案,也可能是目前功能最全面的开源方案。
  • Prometheus Operator 通过 Grafana 展示监控数据,预定义了一系列的 Dashboard
  • 要求kubernetes版本大于等于1.8.0
  • CoreOS/Prometheus-Operator项目地址

Prometheus

  • Prometheus 是一套开源的系统监控报警框架,启发于 Google 的 borgmon 监控系统,作为社区开源项目进行开发,并成为CNCF第二个毕业的项目(第一个是kubernetes)
  • 特点
    • 强大的多维度数据模型
    • 灵活而强大的查询语句(PromQL)
    • 易于管理,高效
    • 使用 pull 模式采集时间序列数据,这样不仅有利于本机测试而且可以避免有问题的服务器推送坏的 metrics。
    • 可以采用 push gateway 的方式把时间序列数据推送至 Prometheus server 端
    • 可以通过服务发现或者静态配置去获取监控的 targets
    • 有多种可视化图形界面
    • 易于伸缩

Prometheus组成架构

  • Prometheus Server: 用于收集和存储时间序列数据
  • Client Library: 客户端库,为需要监控的服务生成相应的 metrics 并暴露给
    Prometheus server
  • Push Gateway: 主要用于短期的 jobs。 jobs 可以直接向 Prometheus server 端推送它们的
    metrics。这种方式主要用于服务层面的 metrics。
  • Exporters: 用于暴露已有的第三方服务的 metrics 给 Prometheus。
  • Alertmanager: 从 Prometheus server 端接收到 alerts
    后,会进行去除重复数据,分组,并路由到对收的接受方式,发出报警。

架构图

Operator架构

  • Operator

    即 Prometheus Operator,在 Kubernetes 中以 Deployment 运行。其职责是部署和管理
    Prometheus Server,根据 ServiceMonitor 动态更新 Prometheus Server 的监控对象。

  • Prometheus Server

    Prometheus Server 会作为 Kubernetes 应用部署到集群中。为了更好地在 Kubernetes 中管理 Prometheus,CoreOS 的开发人员专门定义了一个命名为 Prometheus 类型的 Kubernetes 定制化资源。我们可以把 Prometheus看作是一种特殊的 Deployment,它的用途就是专门部署 Prometheus Server。

  • Service

    这里的
    Service 就是 Cluster 中的 Service 资源,也是 Prometheus 要监控的对象,在 Prometheus 中叫做
    Target。每个监控对象都有一个对应的 Service。比如要监控 Kubernetes Scheduler,就得有一个与 Scheduler
    对应的 Service。当然,Kubernetes 集群默认是没有这个 Service 的,Prometheus Operator
    会负责创建。

  • ServiceMonitor

    Operator
    能够动态更新 Prometheus 的 Target 列表,ServiceMonitor 就是 Target 的抽象。比如想监控
    Kubernetes Scheduler,用户可以创建一个与 Scheduler Service 相映射的 ServiceMonitor
    对象。Operator 则会发现这个新的 ServiceMonitor,并将 Scheduler 的 Target 添加到 Prometheus
    的监控列表中。

    ServiceMonitor 也是 Prometheus Operator 专门开发的一种 Kubernetes 定制化资源类型。

  • Alertmanager

    除了 Prometheus 和 ServiceMonitor,Alertmanager 是 Operator 开发的第三种 Kubernetes 定制化资源。我们可以把 Alertmanager 看作是一种特殊的 Deployment,它的用途就是专门部署 Alertmanager 组件。

部署Prometheus-Operator

切换工作目录

1
2
mkdir -p /root/yaml/prometheus-operator
cd /root/yaml/prometheus-operator

添加coreos源

1
2
# 添加coreos源
helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/

创建命名空间

1
kubectl create namespace monitoring

部署prometheus-operator

  • 这里通过--set指定了image的地址
1
2
3
4
5
6
7
helm install coreos/prometheus-operator \
--name coreos-prometheus-operator \
--namespace monitoring \
--set global.hyperkube.repository=zhangguanzhang/quay.io.coreos.hyperkube \
--set image.repository=zhangguanzhang/quay.io.coreos.prometheus-operator \
--set prometheusConfigReloader.repository=zhangguanzhang/quay.io.coreos.prometheus-config-reloader \
--set rbacEnable=true

部署kube-prometheus

  • 通过运行helm命令安装时,指定一些变量来达到自定义配置的目的
  • 定义grafana初始admin密码为password,默认值是admin
  • 定义alertmanagerprometheus使用名为rook-ceph-blockStorageClass,访问模式为ReadWriteOnce,大小5Gi,默认是50Gi
  • 定义grafanaalertmanagerprometheusService类型为NodePort,默认是ClusterIP
  • 这里的--set可以定义很多变量,具体可以在这里,查看里面每个文件夹的values.yaml
  • 这里配置的变量请自己根据情况修改
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
helm install coreos/kube-prometheus \
--name kube-prometheus \
--namespace monitoring \
--set alertmanager.image.repository="zhangguanzhang/quay.io.prometheus.alertmanager" \
--set alertmanager.service.type="NodePort" \
--set alertmanager.storageSpec.volumeClaimTemplate.spec.storageClassName="rook-ceph-block" \
--set alertmanager.storageSpec.volumeClaimTemplate.spec.accessModes[0]="ReadWriteOnce" \
--set alertmanager.storageSpec.volumeClaimTemplate.spec.resources.requests.storage="5Gi" \
--set grafana.adminPassword="password" \
--set grafana.service.type="NodePort" \
--set prometheus.image.repository="zhangguanzhang/quay.io.prometheus.prometheus" \
--set prometheus.service.type="NodePort" \
--set prometheus.storageSpec.volumeClaimTemplate.spec.storageClassName="rook-ceph-block" \
--set prometheus.storageSpec.volumeClaimTemplate.spec.accessModes[0]="ReadWriteOnce" \
--set prometheus.storageSpec.volumeClaimTemplate.spec.resources.requests.storage="5Gi" \
--set prometheus.deployCoreDNS=true \
--set prometheus.deployKubeDNS=false \
--set prometheus.deployKubeEtcd=true \
--set exporter-kube-controller-manager.endpoints[0]="172.16.80.201" \
--set exporter-kube-controller-manager.endpoints[1]="172.16.80.202" \
--set exporter-kube-controller-manager.endpoints[2]="172.16.80.203" \
--set exporter-kube-etcd.etcdPort=2379 \
--set exporter-kube-etcd.scheme="https" \
--set exporter-kube-etcd.endpoints[0]="172.16.80.201" \
--set exporter-kube-etcd.endpoints[1]="172.16.80.202" \
--set exporter-kube-etcd.endpoints[2]="172.16.80.203" \
--set exporter-kube-scheduler.endpoints[0]="172.16.80.201" \
--set exporter-kube-scheduler.endpoints[1]="172.16.80.202" \
--set exporter-kube-scheduler.endpoints[2]="172.16.80.203" \
--set exporter-kube-state.kube_state_metrics.image.repository="gcrxio/kube-state-metrics" \
--set exporter-kube-state.addon_resizer.image.repository="gcrxio/addon-resizer"

检查部署情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
kubectl -n monitoring get all
# 输出示例
NAME READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-0 2/2 Running 0 43m
pod/kube-prometheus-exporter-kube-state-66b8849c9b-cq5pp 2/2 Running 0 42m
pod/kube-prometheus-exporter-node-p6z67 1/1 Running 0 43m
pod/kube-prometheus-exporter-node-qnmjt 1/1 Running 0 43m
pod/kube-prometheus-exporter-node-vr4sp 1/1 Running 0 43m
pod/kube-prometheus-grafana-f869c754-x5x7n 2/2 Running 0 43m
pod/prometheus-kube-prometheus-0 3/3 Running 1 43m
pod/prometheus-operator-5db9df7ffc-dxtqh 1/1 Running 0 49m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 43m
service/kube-prometheus NodePort 10.97.183.252 <none> 9090:30900/TCP 43m
service/kube-prometheus-alertmanager NodePort 10.105.140.173 <none> 9093:30903/TCP 43m
service/kube-prometheus-exporter-kube-state ClusterIP 10.108.236.146 <none> 80/TCP 43m
service/kube-prometheus-exporter-node ClusterIP 10.96.14.75 <none> 9100/TCP 43m
service/kube-prometheus-grafana NodePort 10.109.4.170 <none> 80:30164/TCP 43m
service/prometheus-operated ClusterIP None <none> 9090/TCP 43m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-prometheus-exporter-node 3 3 3 3 3 <none> 43m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-prometheus-exporter-kube-state 1 1 1 1 43m
deployment.apps/kube-prometheus-grafana 1 1 1 1 43m
deployment.apps/prometheus-operator 1 1 1 1 49m

NAME DESIRED CURRENT READY AGE
replicaset.apps/kube-prometheus-exporter-kube-state-658f46b8dd 0 0 0 43m
replicaset.apps/kube-prometheus-exporter-kube-state-66b8849c9b 1 1 1 42m
replicaset.apps/kube-prometheus-grafana-f869c754 1 1 1 43m
replicaset.apps/prometheus-operator-5db9df7ffc 1 1 1 49m

NAME DESIRED CURRENT AGE
statefulset.apps/alertmanager-kube-prometheus 1 1 43m
statefulset.apps/prometheus-kube-prometheus 1 1 43m

访问Prometheus-Operator

  • 部署时已经定义alertmanager、prometheus、grafana的Service为NodePort
  • 根据检查部署的情况,得知
    • kube-prometheusNodePort30900
    • kube-prometheus-alertmanagerNodePort30903
    • kube-prometheus-grafanaNodePort30164
  • 直接通过这些端口访问即可
  • grafana已内嵌了基础的Dashboard模板,以admin用户登录即可见

EFK

说明

  • 官方提供简单的fluentd-elasticsearch样例,可以作为测试用途
  • 已经包含在kubernetes项目当中链接
  • 这里使用kubernetes-server-linux-amd64.tar.gz里面的kubernetes-src.tar.gz提供的Addons
  • 修改elasticsearch使用rook-ceph提供的StorageClass作为持久化存储,默认是使用emptyDir

注意

  • EFK集群部署之后,kibanaelasticsearch初始化过程会极大的消耗服务器资源
  • 请保证你的环境能撑的住!!!!
  • 配置不够,服务器真的会失去响应
  • 实测3节点4C 16G SSD硬盘,CPU持续十几分钟的满载

解压源代码

1
2
3
4
5
6
7
8
9
10
tar xzf kubernetes-server-linux-amd64.tar.gz kubernetes/kubernetes-src.tar.gz
cd kubernetes
tar xzf kubernetes/kubernetes-src.tar.gz
tar xzf kubernetes-src.tar.gz \
cluster/addons/fluentd-elasticsearch/es-service.yaml \
cluster/addons/fluentd-elasticsearch/es-statefulset.yaml \
cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml \
cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml \
cluster/addons/fluentd-elasticsearch/kibana-deployment.yaml \
cluster/addons/fluentd-elasticsearch/kibana-service.yaml

切换工作目录

1
cd cluster/addons/fluentd-elasticsearch/

修改yaml文件

  • 删除es-statefuleset.yaml里面的字段,位置大概在100行左右
1
2
3
volumes:
- name: elasticsearch-logging
emptyDir: {}
  • 添加volumeClaimTemplates字段,声明使用rook-ceph提供的StorageClass,大小5Gi
  • 位置在StatefulSet.spec,大概67行左右
1
2
3
4
5
6
7
8
9
volumeClaimTemplates:
- metadata:
name: elasticsearch-logging
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "rook-ceph-block"
resources:
requests:
storage: 5Gi
  • 修改后,es-statefulset.yaml内容如下
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "services"
- "namespaces"
- "endpoints"
verbs:
- "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: kube-system
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: elasticsearch-logging
namespace: kube-system
apiGroup: ""
roleRef:
kind: ClusterRole
name: elasticsearch-logging
apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
version: v5.6.4
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
serviceName: elasticsearch-logging
replicas: 2
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: v5.6.4
volumeClaimTemplates:
- metadata:
name: elasticsearch-logging
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: rook-ceph-block
resources:
requests:
storage: 5Gi
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v5.6.4
kubernetes.io/cluster-service: "true"
spec:
serviceAccountName: elasticsearch-logging
containers:
- image: gcrxio/elasticsearch:v5.6.4
name: elasticsearch-logging
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
- name: elasticsearch-logging
mountPath: /data
env:
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: elasticsearch-logging
emptyDir: {}
# Elasticsearch requires vm.max_map_count to be at least 262144.
# If your OS already sets up this number to a higher value, feel free
# to remove this init container.
initContainers:
- image: alpine:3.6
command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
name: elasticsearch-logging-init
securityContext:
privileged: true
  • 注释kibana-deployment.yaml定义的环境变量
  • 大概在35行左右
1
2
# - name: SERVER_BASEPATH
# value: /api/v1/namespaces/kube-system/services/kibana-logging/proxy

修改镜像地址

  • 默认yaml定义的镜像地址是k8s.gcr.io,需要科学上网
  • 变更成gcrxio
1
sed -e 's,k8s.gcr.io,gcrxio,g' -i *yaml

给节点打Label

  • fluentd-es-ds.yamlnodeSelector字段定义了运行在带有beta.kubernetes.io/fluentd-ds-ready: "true"标签的节点上
  • 这里为了方便,直接给所有节点都打上标签
1
kubectl label node --all beta.kubernetes.io/fluentd-ds-ready=true

部署EFK

1
kubectl apply -f .

查看部署情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
kubectl -n kube-system get pod -l k8s-app=elasticsearch-logging
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-0 1/1 Running 1 10m
elasticsearch-logging-1 1/1 Running 0 10m

kubectl -n kube-system get pod -l k8s-app=kibana-logging
NAME READY STATUS RESTARTS AGE
kibana-logging-56fb9d765-l95kj 1/1 Running 1 37m

kubectl -n kube-system get pod -l k8s-app=fluentd-es
NAME READY STATUS RESTARTS AGE
fluentd-es-v2.0.4-2mwz7 1/1 Running 0 3m
fluentd-es-v2.0.4-7mk4d 1/1 Running 0 3m
fluentd-es-v2.0.4-zqtpc 1/1 Running 0 3m

kubectl -n kube-system get svc -l k8s-app=elasticsearch-logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch-logging ClusterIP 10.111.107.21 <none> 9200/TCP 39m

kubectl -n kube-system get svc -l k8s-app=kibana-logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kibana-logging ClusterIP 10.96.170.77 <none> 5601/TCP 39m

访问EFK

  • 修改elasticsearchkibanasvcnodePort
1
2
kubectl patch -n kube-system svc elasticsearch-logging -p '{"spec":{"type":"NodePort"}}'
kubectl patch -n kube-system svc kibana-logging -p '{"spec":{"type":"NodePort"}}'
  • 查看分配的nodePort
1
2
3
4
5
6
7
kubectl -n kube-system get svc -l k8s-app=elasticsearch-logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch-logging NodePort 10.111.107.21 <none> 9200:30542/TCP 42m

kubectl -n kube-system get svc -l k8s-app=kibana-logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kibana-logging NodePort 10.96.170.77 <none> 5601:30998/TCP 42m
  • 可以看到端口分别为3054230998

在github上获取yaml文件

  • 如果不想用kubernetes-src.tar.gz里面的Addons
  • 可以直接下载github上面的文件,也是一样的
1
2
3
4
5
6
wget https://raw.githubusercontent.com/kubernetes/kubernetes/${KUBERNETES_VERSION}/cluster/addons/fluentd-elasticsearch/es-service.yaml
wget https://raw.githubusercontent.com/kubernetes/kubernetes/${KUBERNETES_VERSION}/cluster/addons/fluentd-elasticsearch/es-statefulset.yaml
wget https://raw.githubusercontent.com/kubernetes/kubernetes/${KUBERNETES_VERSION}/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml
wget https://raw.githubusercontent.com/kubernetes/kubernetes/${KUBERNETES_VERSION}/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
wget https://raw.githubusercontent.com/kubernetes/kubernetes/${KUBERNETES_VERSION}/cluster/addons/fluentd-elasticsearch/kibana-deployment.yaml
wget https://raw.githubusercontent.com/kubernetes/kubernetes/${KUBERNETES_VERSION}/cluster/addons/fluentd-elasticsearch/kibana-service.yaml

本文至此结束